Wiki: Collected feedback on interop testing, methods, living docs and specs

aschrijver · September 1, 2023, 3:56pm

This is a wiki post and anyone is encouraged to Edit and add additional sources.

ActivityPub Test Suite

Needs

“Status of a robust ActivityPub test suite?” on SocialHub by Tim Chambers (@tchambers).
- Tim Chambers is keeping track of notes in this Google Doc.
- @stevebate: Works on fully automated test suite (Python), gives details [MUST READ].
- @helge: To @stevebate: Bovine has many tests, maybe we can combine those.
- @stevebate: Modularity (common + server-specific). Countless scenario’s, how to test?
- @by_caballero: Considers suite impl. Howto avoid manual work? SocialCG meeting?
- @helge: AP is awful to test, interop tests most useful. Use Gherkin, see Diaspora.
- @stevebate: Will experiment with Gherkin, but hard to express behavior neatly.
- @helge: Agree, right Gherkin format is hard. Here’s Bovine http_signatures feature.
- @stevebate: Ported @cwebber’s test suite to Python rocks-testsuite (demo site).
“Status of a robust ActivityPub test suite?” related Fediverse thread by Tim Chambers.
- @OpinionatedGeek: Message validation during app-related message exchange.
- @pfefferle: Time saving. Basic AP node interop testing is most time-consuming of all.
- @OpinionatedGeek: App-specific Example messages (fedidevs.org started collecting)
- @tchambers: Avoid fraud. AP standards compliance testing (refers to SWICG mail).
- @robz: Need to test secure digest signatures as well as well-formedness of msgs.
- @robz: Atomic tests + scorecard, test Webfinger, access to public timelines.
- @robz: Challenge is: test suite becomes both formal and ad-hoc reference impl.
- @OpinionatedGeek: Ease of use (existing suites weren’t easy to use).
- @nodebb: We build a test suite, but for our app and in our language (JS).
- @sfunk1x: How can Webfinger work for hosting multiple AP services on one domain?
- @j3j5: @dansup will resurrect and open source old FediDB testing tools again.
- @tchambers: What is salvagable for reuse & extension in old Go-Fed Test Suite.
- @robz: What programming language are we gonna write our test suite in?
- @devnull: Fedi groups are unintuitive for collab. I like ad-hoc working groups.
- @Helge: AP compliance test is easy + unuseful, interop testing is great + hard.
- @Helge: AP compliance test only constitute 2 MUST’s related to inbox + outbox.
@ActivityPubTestSuite@venera.social Friendica group
- @rmdes: I wonder how much AP activity my micro.blog instance uses on daily basis.
- @django: @dansup, will you update this FediDB Community Edition (PHP) on Github?
- @robz: @cwebber once asked on fedi “Who will maintain my GUILE test suite?”, Nov 2022.
- @benpate: Testing that doesn’t need spinning up full instances is tremendously valuable.
- @devnull: I assume a remote server to test against is the end goal. Tim Chambers confirms.
- @helge: True value of test suite is when showing implementation reports.
- @jamiexml: “NICE. Inclusive tool use”, related to @ActivityPubTestSuite initiative.
- @tchambers: Test suite is crucial to withstand EEE threat of Meta et al. See my blog post.
- @emc2: Test suite important to deal with private sector, fend off hegemony.
- @wizzwizz4: Need a test suite that runs locally. Unsure how to test S2S that way.
- @OpinionatedGeek: Wrote blog post about trying to run Pubstrate in Docker, but failing.
- @lmas: Dev-friendly instances for testing? Spinning up own instances difficult (SSL, domain).
- @OpinionatedGeek: Same issue. Needs to revert to ‘known’ state. Shares Docker solution.
- @helge: Started docker-compose for Bovine for HTTP-only tests. HTTPS works now. Test cases come from fediverse-features.
- @darius: Official test suite should be funded for long-term sustainability, to stay up-to-date.
- @steve: Ported test suite from Guile to Python at aptestsuite.stevebate.dev.
- @blake: Test suite is more like a guide for writing tests. Doesn’t test, generates reports.
- @bobwyman: Ideal test-suite gives reference impls for client + server for whole AP spec.
- @bobwyman: Suite needs system-specific call-outs for auth/authz (or test-mode to skip).
- @cwebber: Test suite was a questionnaire. Very hard to automate given the problem domain.
- @cwebber: If I had to rewrite the test suite, I made it a script run locally.
- @steve: Experiment: CLI compliance testing + impl-specific adapters for server handling.
- @evan: Similar project onepage.pub, example AP server + test practices. Let’s collab.
  @OpinionatedGeek: My dev project passes C2S test on AP test suite. Passes screenshot.
ActivityPub Test Suite by Bob Wyman on SWICG Mailing list, March 2023
- → @dansup about this initiative: Rather works alone, create (and hand-over of?) an MVP.
- What to test, needs & requirements, should SWICG maintain test suite?
- Darius Kazemi: Old @cwebber suite hadn’t great coverage, still very useful to revive.
- Bob Wyman: Don’t hand-code. Automate docs + codegen (e.g. OpenAPI/Swagger).
- Marcus Rohrmoser: Many complaints on inadequate test suite in W3C AP repository.
- Sebastian Lasse: More W3C complaints. Points to Yuforium activity-streams.
- Benjamin Goering: Test security,fraud-prevention.
- Marcus Rohrmoser: Affirms Benjamin’s post. Decentralize + agency + aggregation.
Meta unspools Threads by Johannes Ernst, asking Ben (Meta) to fund a Test Suite.
- Tim Chambers: +1’s and points out effort to rebuild Python web suite (by Steve Bate).
@hrefna: A Mastodon API test suite is needed, if only to be able to move away from Mastodon.
The ActivityPub Test Suite on SocialHub relating to @cwebber’s Pubstrate, Nov 2019.
- [TODO]
Exhaustive list of Fediverse app compatibilities? on SocialHub, by @KaKi87.
- @KaKi87: Seeks help for opening Fedi links ones own Client UI (via browser extension).
- @danielhz: Points out example of incompatibilities between Mastodon and Kbin.
@aschrijver: Advocates for Gherkin/BDD for Fediverse. Helge is motivated to do a deep-dive.
- Pointed Helge to Diaspora tests as an example social platform that uses BDD.
- Helge’s is test-driving BDD in fediverse-features repository on Codeberg.
- @pzingg: Ported BDD tests for Elixir (also wrote Elixir JSON canonicalization lib).
- @helge: Plan to build implementation list for each feature with automated checking.
- @helge: Posts BDD 101, part of big BDD knowledge base, listing e.g. these 12 benefits:

Inclusion Anyone can write BDD scenarios, because they are written in plain English. Think of The Three Amigos.

Clarity Scenarios focus specifically on the expected behavior of the product under development, resulting in less ambiguity for what to develop.

Streamlining Requirements = acceptance criteria = test cases. Modular syntax expedites automation as well.

Shift-Left Test case definition inherently becomes part of grooming.

Artifacts Scenarios form a collection of test cases. Any tests not automated can be added to a known automation backlog.

Automation BDD frameworks make it easy to turn scenarios into automated tests.

Test-Driven Most BDD frameworks can run scenarios to fail until the feature is implemented.

Code Reuse “Given-When-Then” steps can be reused between scenarios.

Parameterization Steps can be parameterized. For example, a step to click a button can take in its ID.

Variation Using parameters, example tables make it easy to run the same scenario with different combinations of inputs.

Momentum Scenarios become easier and faster to write and automate as more step definitions are added.

Adaptability Scenarios are easy to rewrite as the products and features change.

Notes from @jfinkhaeuser:
- There are a few traps here in the BDD summary that are worth picking apart in a project:
  1. Writing BDD scenarios is more of a challenge than the “Inclusion” point suggests. They may be plain English, but it’s easy to get them wrong.
  2. “The BDD community” is less focused on testing than on collaboration between different stakeholders. This has subtle results on tool usage.
  3. Step re-use is relatively low when scenarios are optimally written for inclusion & maintenance.
  4. Step parametrization can lead to less inclusion, as each step becomes more like a function.
  5. Typically, the solution to the above is a three layer archictecture (BDD, steps, functions representing interactions hiding implementation details) rather than the suggested two layers, BUT this implies the bottom layer is deeply technical and tightly coupled to the product.
Tests for an ActivityPub implementation checklist (project ports Go-Fed to Elixir)
- Adapted from the Guide for new ActivityPub implementers SocialHub checklist.
Sytest: Black-box integration testing for Matrix homeservers (thanks @realaravinth)
- Example of how Matrix does testing of Synapse server. Supports plugins for extension.
Part of W3C AP standardization process a large collection of User Stories was collected.
- Can be used as input / inspiration to templates, best-practice for writing Gherkin scripts.
- More user stories collected after official gathering process was closed.
@hazel toots about a C# Test Suite that is part of ActivityPubSharp library project.
@fr33domlover provided behavior pseudocode related to Vervis.

Click to expand and view the Vervis behavior scripts (text by fr33domlover)

Just to put this in the shared field, here’s an example of a quite-detailed behavior description I have for an Activity Handler that I just finished implementing (it’s very unusually long and complex, because the handler code itself is complex :P)

-- Meaning: An actor accepted something
-- Behavior:
--     * Check if I know the activity that's being Accepted:
--         * Is it an Invite to be a collaborator in me?
--             * Verify the Accept is by the Invite target
--         * Is it a Join to be a collaborator in me?
--             * Verify the Accept is authorized
--         * Is it an Invite to be a component of me?
--             * Nothing to check at this point
--         * Is it an Add to be a component of me?
--             * If the sender is the component:
--                 * Verify I haven't seen a component-Accept on this Add
--             * Otherwise, i.e. sender isn't the component:
--                 * Verify I've seen the component-Accept for this Add
--                 * Verify the new Accept is authorized
--         * If it's none of these, respond with error
--
--     * In collab mode, verify the Collab isn't enabled yet
--     * In component mode, verify the Component isn't enabled yet
--
--     * Insert the Accept to my inbox
--
--     * In collab mode, record the Accept and enable the Collab in DB
--     * In Invite-component mode,
--         * If sender is component, record the Accept and enable the Component
--           in DB
--         * Otherwise, nothing at this point
--     * In Add-component mode,
--         * If the sender is the component, record the Accept into the
--           Component record in DB
--         * Otherwise, i.e. sender isn't the component, record the Accept and
--           enable the Component in DB
--
--     * Forward the Accept to my followers
--
--     * Possibly send a Grant:
--         * For Invite-collab mode:
--             * Regular collaborator-Grant
--             * To: Accepter (i.e. Invite target)
--             * CC: Invite sender, Accepter's followers, my followers
--         * For Join-as-collab mode:
--             * Regular collaborator-Grant
--             * To: Join sender
--             * CC: Accept sender, Join sender's followers, my followers
--         * For Invite-component mode:
--             * Only if sender is the component
--             * delegator-Grant
--             * To: Component
--             * CC:
--                 - Component's followers
--                 - My followers
--         * For Add-component mode:
--             * Only if sender isn't the component
--             * delegator-Grant
--             * To: Component
--             * CC:
--                 - Component's followers
--                 - My followers
--                 - The Accept's sender

(It’s the Project actor handling an Accept activity)

Example of a a simpler description, for Project handling an Add activity:

-- Meaning: An actor is adding some object to some target
-- Behavior:
--     * Verify my components list is the target
--     * Verify the object is a component, find in DB/HTTP
--     * Verify it's not already an active component of mine
--     * Verify it's not already in a Add-Accept process waiting for project
--       collab to accept too
--     * Verify it's not already in an Invite-Accept process waiting for
--       component (or its collaborator) to accept too
--     * Insert the Add to my inbox
--     * Create a Component record in DB
--     * Forward the Add to my followers
projectAdd

@stevebate on fedidevs chat: What does spec compliance even mean?
- (@aschrijver: Imho it is only meaningful in context of a Compliance Profile)
- @stevebate: Howto compliance test “compliant, but unsupported Activity” (e.g. Travel)?
- @trwnh: Test suite should test normative + “unstated logically implied” requirements.
- @j12r: Made test requirements notes. make test-fediverse is needed.
- @helge: Test suite must make bugs visible. Finds replies collection hardly implemented.
- @stevebate: Useful tool probes a server to see what it supports via AP endpoint API.
- @helge: Built something like that on proof.mymath.rocks. “Most stuff is optional”.
- @trwnh: Explains how the underspec’ed replies collection works.
- @j12r: Asks preferred techstacks, to align test suite technology (e.g. Docker-based).
- @snarfed: Local + live testing ideal, but live preferred. See WebMention.rocks example.
@aschrijver: Found out (via HN) about the TestContainers project:

Fediverse Testing Practices

Needs

@johnny in FediDevs General chat: Best-practices for local testing of AP services?
- Proper testing requires domain + SSL certificate for TLS. Usually uses ngrok.
- Both johnny and Helge see this as ‘spam federation’ (“mastodon.social has 514 ngrok peers”)
- @j12r: Hase series of Letsencrypt certs in “empty” site for systemd-nspawn containers.
- @RyunoKi mentions auto-encrypt-localhost small-tech project by Aral Balkan.
- @vladimyr: Names 2 projects, localtls and nip.io.
Pleroma project has a large collection of fixtures with JSON msg formats for many apps.
@helge http_activitypub_test: URL’s for specific features return a Bovine JSON response.
@crepels: Launches ActivityPub Academy, masto instance giving insight of msg exchange.
Revive activitystreams-validator on SocialHub: Built by @evan years ago.
WebArena: Realistic Web Environment for Building Autonomous Agents. Nice concept.
Mobilizon have a good set of test fixtures for their type of app to take example from.
SocialCG Special Topic Call: AP Interop Test Suites ( 2023-08-11):
- @j12r: Presents test suite project, has written intricate plan, seeks funding.
- @dariusk: Collects ActivityPub messages “in the wild”.
- Also points to project for Debugging AP on Glitch, proposes view layer for AP test suite.
- @bengo: Collects ActivityPub Protocol Behaviors at Codeberg repo.
@stevebate announces availability of his activitypub-testsuite in Python.
- Similarities to Johannes’ testsuite plan but need no containers, servers run in subprocess.
@aschrijver: I found this Clojure project Maelstrom, workbench for writing toy implementations of distributed systems.
- “[Maelstrom] uses the Jepsen testing library to test toy implementations of distributed systems. [..] Maelstrom’s tooling lets users experiment with simulated latency and message loss. Every test includes timeline visualizations of concurrency structure, statistics on messages exchanged, timeseries graphs to understand how latency, availability, and throughput respond to changing conditions, and Lamport diagrams so you can understand exactly how messages flow through your system.”

@aschrijver: Just sharing this for fun:

TigerBeetle uses a game to simulate distributed network communication (click to expand)

Tigerbeetle is a distributed DB made in Zig specifically for financial transactions and uses WebAssembly in interesting ways. The folks at the project have developed ‘a game’ that simulates actual network interactions and you can do can do all kinds of things to test the network. Here’s the demo. Who knows some day the Friendliverse in Mimic provides similar fun way of conducting tests

Living Documentation

Needs

Document federation behavior in a semi-standard way? on SocialHub by Darius Kazemi
- @darius: FEDERATION.md convention for philosophy, inbox behavior, activities triggered.
- Observation: Template, structure, vocabulary need more work.
- Observation: Hardly used/known practice (see project list), inconsistent, too informal.
- @silverpill: Thinks of submitting a FEP to standardize, provides FEP template.
- @snarfed: Looks like decentralized fedidocs.org. May aggregate: central clearinghouse?
- @aschrijver: Fedidocs is unresponsive to request for more cohesion.
Improvement to FEDERATION.md convention: Murmurations on SocialHub by @aschrijver
- Proposal: Federate the contents of FEDERATION.md to be used/aggregated anywhere.
Known FEDERATION.md docs listed in Guide for new ActivityPub implementers on SocialHub:
Complementary to NodeInfo, some implementations have introduced a federation.md file to describe instance capabilities in a human readable way. See this discussion for more information. The federation.md files are a good way to get a sense of how different projects are using and adapting ActivityPub. Examples include:
- gathio
- WriteFreely
- Zap
- Tavern
- Smithereen
- gancio
- Lemmy
- Streams
- Mastodon
- Mitra
- FediMovies
- Vervis
- Emissary

Living specifications

Needs

Key realization: AS/AP is a framework with which you can model any msg exchange.
- Hence tackling major fedi challenge of substrate formation is a requirement for interop.
- Dev community is slowly gaining this insight, recently in Next step for ActivityPub.
- Compliance Profiles are a major feature that can be the basis for substrate formation.
Automation of manual procedures related to Fediverse Enhancement Proposals.
- @helge PR’ed Python scripts and docs to help publishing FEP’s as web pages.
Statement was made (by @helge?) that W3C Verifiable Credentials Data Model contains best-practices as input for writing AP Extensions.
Various FEP’s are being outfitted with Behaviour Test scripts in Gherking format.
- E.g. FEP-c390: Identity Proofs has a Gherkin attachment of fep-c390.feature.
FEP Process discussion in Usage of PR’s for FEP’s on SocialHub:
- Feedback is dispersed between Issue / PR / Forum topic discussions. Not good.
- These tools not only don’t integrate well, they are non-inclusive, made for technical audience.
- @by_caballero: Summarizes the problem statement neatly (read their post).
@trwnh in Fediverse Devs chatroom: Evan asked to rewrite Mastodon Move as a FEP, but most masto docs I wrote should be moved to FEP’s. They need additional “caveats and limitations” sections.
- Example for Move see https://docs.joinmastodon.org/spec/activitypub/#Move, things missing are cooldown periods on sending/receiving and use of as:movedTo property (not in AS2 vocab).
@runevision: Created great AP client comparison Google Sheet. Lists Fediverse features.
@aschrijver: GAIA-X in Federated Catalogues (unrelated to fedi) uses BPMN diagrams to model behavior.
@j12r: How do we keep fedi architecture consistent? Do we need an architecture board?
- @hrefna: Current fedi encourages lack of interop, integration requires Θ(n^2) testing.
- @hrefna: See this. Need a metamodel, describe extensions, core features, handshake.
- @jenniferplusplus: We should publish suites of compatibility tests. No central authority.
- @j12r: Used to run metamodel.com, see e.g. this article on terminology.
Ghost uses Gherkin feature descriptions to define expected ActivityPub federation behavior.

Inclusion	Anyone can write BDD scenarios, because they are written in plain English. Think of The Three Amigos.
Clarity	Scenarios focus specifically on the expected behavior of the product under development, resulting in less ambiguity for what to develop.
Streamlining	Requirements = acceptance criteria = test cases. Modular syntax expedites automation as well.
Shift-Left	Test case definition inherently becomes part of grooming.
Artifacts	Scenarios form a collection of test cases. Any tests not automated can be added to a known automation backlog.
Automation	BDD frameworks make it easy to turn scenarios into automated tests.
Test-Driven	Most BDD frameworks can run scenarios to fail until the feature is implemented.
Code Reuse	“Given-When-Then” steps can be reused between scenarios.
Parameterization	Steps can be parameterized. For example, a step to click a button can take in its ID.
Variation	Using parameters, example tables make it easy to run the same scenario with different combinations of inputs.
Momentum	Scenarios become easier and faster to write and automate as more step definitions are added.
Adaptability	Scenarios are easy to rewrite as the products and features change.