The ActivityPub test suite

marnanel · October 2, 2020, 4:05pm

thank you! I’ve c&p’d my response there.

WClayFerguson · October 2, 2020, 4:51pm

I won’t be able to participate in the hackathon, but definitely once I have reasonable detailed examples of the JSON inputs/outputs for a ActivityPub service, I’ll add AP capabilities to Quanta.wiki (link above), and I’m looking for other developers (and/or funding) who can recognize the value of that platform, especially after AP is added!

how · October 2, 2020, 5:04pm

For the funding part see Scaling Up Cooperation BoF…

how · October 3, 2020, 9:16am

4 posts were split to a new topic: Appealing to Users With Examples in Software Demos

WClayFerguson · October 2, 2020, 6:35pm

Thanks for the link to the funding related stuff. I’m going to submit the Quanta project for that. The project is still pretty much a secret (unknown), and hasn’t been posted to HackerNews or other developer sites yet, although I’ve been developing it for years and it’s fairly robust/stable at this point.

About the jet, that X-15 is the “lorem ipsum” of images! haha. Everyone should use the X-15 for demo content images.

mro · May 30, 2022, 1:09pm

what is the current state?

astrojuanlu · November 1, 2022, 8:37am

Came here after bouncing from many open issues and threads elsewhere (w3c/activitypub#351, go-fed/testsuite#16).

What is the current best course of action for developers looking to validate current implementations of ActivityPub, or write their own?

nutomic · November 2, 2022, 11:18am

What worked for us was to first develop Lemmy-Lemmy federation, as this allows for easy testing and debugging. Later we started federating with other projects, which obviously brought some breaking changes where we did things wrong, but it wasnt a big deal.

mro · November 23, 2022, 10:07pm

there’s a call for action concerning the test suite of ActivityPub. Contact Christine Lemmer-Webber: "Does someone want to take over the old activitypu…" - the Octodon

nilesh · January 5, 2023, 1:39pm

I have put together a collection of API definitions for Insomnia, which has the ability to write and run testcases as well. It is by no means comprehensive, but if you need something that’s quick-and-dirty and usable right away, you may find this useful: GitHub - nileshtrivedi/activitypub-testsuite

FYI, Insomnia has features like environment variables, request chaining (extracting data from responses and using them in subsequent requests) and Javascript-based testcases which actually make it very effective for this purpose.

stevebate · March 6, 2023, 8:16am

Has anyone attempted to extract the test case definitions, independent of the programming language of the test runner? I’m thinking of something along the lines of Linked Data Platform 1.0 Test Cases (w3.org) (or possibly something more informal).

helge · March 6, 2023, 9:15am

Just to mention it The tests in /tests should do actual http · Issue #5 · HelgeKrueger/bovine · GitHub would solve the Test Suite problem. The tests currently have enough coverage that once they pass, I can deploy to https://mymath.rocks/ and everything works. When issues arise, I add tests.

Of course, if one looks for example at bovine/test_create_note.py at df9878875fade515a2252bf0377f2ab383852b37 · HelgeKrueger/bovine · GitHub

One sees that there are lots of issues with trying to extend this to other platforms. The entire code relying on mocks need to be abstracted to use real servers.

The Labels link to bovine/specification.md at df9878875fade515a2252bf0377f2ab383852b37 · HelgeKrueger/bovine · GitHub That label should probably be Retrieving Objects. As I said, there are lots of small issues.

Finally My current understanding is that there never was a test suite for Server To Server, which is the case most people care about. I’ll be very happy to be proven wrong.

stevebate · March 6, 2023, 3:54pm

This looks interesting, but I wonder how many test case descriptions could be declarative like the LDP test cases? If I understood correctly, your test suite also requires the bovine variant of the C2S protocol.

The first message in this topic mentions an S2S “test suite” with a few automated tests and a questionnaire.

I’m primarily interested in an S2S suite. It would probably need to be an AP S2S Core suite and one or more application or application domain-specific suites (Mastodon/microblogging, …).

helge · March 6, 2023, 6:37pm

I do not understand the statement. You cannot test S2S without being able to look into the actor’s inbox. For this you need a C2S type implementation. I can imagine some S2S tests that work without looking in an inbox, but they all involve double side effect behavior, e.g.

Alice on server Abel send a message to her followers: Bob on Server Banach and Carol on server Cantor
Bob on server Banach replies with Alice’s followers collection in the recipients (and of course Alice).
Abel forwards the message to Carol according to 7.1.2 Forwarding from Inbox
Cantor gets the message from Banach to verify its integrity

In order to pay attention to these corner cases, one should have a fully functional ActivityPub Server.

I also do not understand what you mean by Domainspecific ActivityPub. ActivityPub itself is domain neutral. The domain only enters once one renders the messages in a frontend, respectively what types of messages the user is allowed to compose.

Addendum to the first point: From the ActivityPub Spec:

sharedInbox endpoints SHOULD also be publicly readable OrderedCollection objects containing objects addressed to the Public special collection.

If one implements this, one can probably run all kinds of tests on just AP S2S. However, I’m unaware of a project implementing this. Also I’m wondering if a publicOutbox wouldn’t make more sense… That sentence belongs to the ones most needing work in the AP Spec.

stevebate · March 7, 2023, 7:04am

I agree that there must be some way to verify the side effects of a POST to the inbox endpoint. I don’t see any reason to require that this is a C2S-like implementation, given that part of the AP specification is optional and most server implementations don’t support it. One possibility is a defined test fixture API that allows some introspection for test assertion purposes. Another possibility (that I’ve seen used with financial protocol compliance testing) is to have interactive steps in the test sequence that require user confirmation (e.g., that a message was received by the target actors).

Note that Mastodon, for example, has an inbox endpoint but there is no inbox to query. The inbox activities are converted to Mastodon-specific internal data (Status, Timelines, etc.) and stored in a database. You cannot retrieve the ordered collection of activities that led to the current state of the data (a Mastodon timeline is not the same).

In any case, I do agree that since S2S is a write-only API, it makes testing more challenging than for something like the LDP. However, I suspect that even just a written description of test cases, organized by requirement levels, would be useful for server developers.

As a pragmatic implementer, I’m interested in server interoperability (S2S federation). Testing purely against the AP specification will not be sufficient for my purposes. It’s very possible to create an AP-compliant server that doesn’t federate with any existing implementations.

By domain-specific, I mean there could be test suites that support microblogging federation, image sharing, social bookmarking, event planning, and so on. Each of these may use the AS2 vocabulary in specific ways and possibly have extended the vocabulary. In some of these application domains, there are server implementations that are defining the de-facto “standards” for that domain. Obviously, Mastodon is that application in the microblogging domain. I can envision a test suite for conformance to the Mastodon microblogging S2S protocol (which would have some AP generic component to it).

helge · March 7, 2023, 9:24am

I don’t disagree with this statement as I don’t disagree with anything you said above. However, I want to recontextualize it.

I would view the Mastodon API as a C2S-like implementation. One would need to be very careful on how to work around its idiosyncrasies. However, I don’t believe I have written a single test that Mastodon would not pass (after adapting them to the aforementioned idiosyncrasies).
I want to focus on Mastodon a second. It also relates to domain specific FediVerse test suites, which are the second thing we are discussing. One of the Mastodon idiosyncrasies is the requirement for each actor having a preferredUsername property such that the webfinger returns the actor when requesting the account “preferredUsername@domain”. This has consequences
- If you want to support testing Mastodon, you must be able to satisfy the above constraint. This basically means “Let’s use a full server on one side of the test”.
- You need to add test cases for this behavior. These test cases are a FediVerse TestSuite already, as they don’t describe stuff in the ActivityPub Specification.
- You actually probably want these test cases to be optional. Advocates for things like nomadic identity, onion type vpns, and so … will agree. I’m not sure having such an ID for stuff like a Service Actor makes sense.
Writing tests for just ActivityPub is hard. One needs to make assumptions on the underlying Authentication, Authorization, Visibility, and Data Storage models. One ends up at places such as Architecture of the World Wide Web, Volume One if one starts looking into it.

We are in agreement on this point. However, I consider this problem nicely decoupled. The handling of Activities should be done following the ActivityPub specification. The handling of Objects should be domain specific. Back to the Mastodon example, if I send a Create Activity with Note Object with content

<table><tr>
<td>Abel</td><td>Banach</td>
</tr><tr>
<td>Cantor</td><td>Dedekind</td>
</tr></table>

Then Mastodon stores the content as something containing the names “Abel”, “Banach”, “Cantor”, and “Dedekind”. Writing an ActivityPub Test Suite that allows for this behavior is possible. It’s probably even easy.

This will then result in “Mastodon passes the ActivityPub Test Suite that allows Object Modifications”. (Mastodon will pass the Client 2 Server parts, when defined properly; Also this is just about asserting when two Objects are equal). Specifying what Object Modifications are ok, is then the task of the Domain Specific FediVerse Test Suites.

stevebate · March 7, 2023, 10:46am

We might not be discussing exactly the same thing (I would not consider the Mastodon client API as anything AP C2S-ish), but the example above is a good one. However, I don’t believe there’s any requirement for the webfinger resource “account” to match the preferredUsername in the dereferenced actor profile. (Some people use this “feature” to support referencing their Mastodon accounts from a webfinger hosted at a different domain.) In any case, it’s definitely required to have a preferredUsername in the actor profile or Mastodon will treat it as an invalid actor.

This is exactly why I think a “defacto standard” interoperability test suite (including generic core AP cases) is valuable for implementers. Probably everyone who’s implemented an “AP” microblogging server intended to be compatible with Mastodon has had to discover this idiosyncrasy the hard way. To be clear I’m not talking about a single Fediverse test suite, but rather an interrelated family of suites.

Thanks for the discussion. It’s helped to refine my thinking on the topic.

how · March 17, 2023, 2:24pm

Note that the ongoing NGI Zero Core grant seems particularly fit to receive a proposal to fix the ActivityPub test suite. It would indeed be something very nice to have for all kinds of reasons, and one of them is a clear way to develop standards compliance across ActivityPub implementations.

bengo · January 15, 2024, 10:25pm

Anyone looking for a test runner for ActivityPub: let me know what you think of activitypub-testing.

https://bengo.is/activitypub/projects/activitypub-testing/announcement/

mro · July 4, 2024, 9:41am

IMHO npm is too volatile an ecosystem for to rely on for long term conformance tests. Would love to see it in something slower moving.

For today and somebody comfortable with npm, well yes.