FEP-eb22: Supported ActivityStreams types with NodeInfo

Hi everyone, this is a discussion for FEP-eb22. The goal is for servers to advertise what features of the API they support, such as creating a poll or boosting a post. Clients can recognize if a server doesn’t support a feature and hide it from the UI.

This proposal overlaps a little with FEP-9fde and FEP-6481. There has already been some discussion here about FEB-6481. Special thanks to Nik Clayton who inspired this with his FEP and discussion at FediForum. The main difference with this is we attempt to describe what features a server supports based on ActivityStreams types instead of new identifiers or URIs. For that reason, it felt right to start with a new FEP rather than update either of those existing FEPs.

(A little about me: I’m the creator of Micro.blog. I wrote this FEP because I want to have more flexibility especially with Mastodon API clients to adapt their UI for different platforms, but this FEP could be used with the ActivityPub Client to Server API as well.)

2 Likes

ActivityPub was designed with a n:m relationship between servers and clients. This proposal is incompatible with the C2S API because it assumes each user will only ever have a single client that supports a single set of activity types. “Servers” do not support or not support a feature, only clients do, so it’s inappropriate to put that information into node-info.

For example, when I access my C2S inbox, I may do it using a book-aggregation client that supports Review activities, or I may do it using a tiktok-like client that only supports Video objects. And my server might have no idea which clients I have installed or may want to use in the future.

The fact that the Mastodon API does not support sending or receiving different activity types is the fundamental limitation that makes this kind of “negotiated support” necessary and it’s an antipattern that I don’t think we should encourage for new implementations going forward.

Let me try to clarify, especially for your statement “it assumes each user will only ever have a single client that supports a single set of activity types”. That is not the intention. You could still have a variety of clients that handle different activity types, or provide different user experiences.

Whether the Mastodon API should be encouraged or not in the future, in practice we are kind of stuck with it because it is so widely supported. This proposal will make it easier for existing clients to be adapted to non-Mastodon servers, which I think helps the fediverse grow beyond Mastodon-like platforms.

At the most basic level, I just want my server to be able to tell a client that a specific feature is not supported. I hope we can do this in a general way that scales to almost anything in C2S or the Mastodon API.

Thanks for your thoughts. If you have questions or if there’s a better way to solve this, let’s discuss it.

I think NodeInfo is not the best choice because it is an auxiliary protocol for exposing server stats. This has associated privacy and security risks, so developers should be allowed to not implement it, and server operators should be able to turn off NodeInfo endpoint if they want. Instead, supported features can be advertised either using a vendor-specific endpoint (such as /api/v1/instance) or in AP-native way, using a server actor (which can be discovered using Webfinger - FEP-d556).

I would be fine with using /api/v1/instance, but if we want to encourage more people to use C2S instead of the Mastodon API, that seems like a step in the wrong direction. I get the point that NodeInfo isn’t a perfect fit for this, though.

Are there any examples of using server actors to provide clients with this kind of configuration info? I’m looking through https://mastodon.social/actor and it looks mostly the same as any other actor on first glance.

Perhaps the same info could be mirrored in /api/v1/instance (for Mastodon clients) and server actor JSON (for C2S apps).

I’m still not sure that the concept of a “server actor” being used for metadata about “suppported activities” makes sense, or that the general idea itself makes sense, but it could maybe make sense if you think about it in terms of a pre-programmed actor which is not using C2S. This is because from a C2S perspective, all activities are the same, and anything not understood is simply ignored by the client that is reading the inbox. This is what allows extensions to work, for example.

If you are primarily dealing with cases outside of C2S, though, it could be useful to signal that this actor is sort of “externally managed” and that it only understands certain types. The actor might be a bot, for example, or it might be served by a system that doesn’t fully implement ActivityPub. I guess the framing that makes the most sense is “please only send me activities of these shapes”. It would probably make sense on each individual actor rather than on a “server actor”. It might look something like this:

id: <actor>
type: Person
inboxCapabilities:  # this is a bit freeform for now
  - Create Note
  - Create Question  # pet peeve of mine, this should be just Question, but it isn't in mastodon
  - Like Note
  - Like Question
  - Announce Note
  - Announce Question
  - Follow Actor
  - Move Actor

The thing that is a bit iffy though is that you will probably want some framework for expressing requirements and not just basic support. It depends on how you encode the “shape” of activities, and how granular you want to get.

One other tangential note is that I see the concept of a “server actor” making more sense for server-to-server communication than for expressing metadata in support of capabilities or whatever.

I’d strongly discourage anything that further ingrains the Mastodon API as being the API for ActivityPub or the Fediverse.

As I’ve mentioned to Nik in the past, if there is a desire for a standard client API, especially in the context of S2S, then that is something that someone should write a FEP for — implementing the Mastodon API is no small amount of work, and many implementations have bugs or security problems. (e.g., not asserting permissions correctly: Insufficient authorization allowing elevated access to resources · Advisory · pixelfed/pixelfed · GitHub )

It’s also perfectly okay to have different APIs for different software, trying to force all software to use the same API is likely a fool’s errand, because you end up with different functionality getting shoehorned into the “same” shape.

There is also this issue open on the ActivityPub issuer tracker: Standard document/endpoint that defines how a Server federates · Issue #430 · w3c/activitypub · GitHub

No, I haven’t seen server actors used for that purpose. This idea originally belongs to @helge : fep/fep/aaa3/fep-aaa3.md at e1b2a16707b542ea5ea0cfb390ac1abce89f05bb - helge/fep - Codeberg.org

Friendica adds a generator property to all actors, which contains an abbreviated (partial) Application actor. I think this is a good way to establish a relationship between the user and the application.

Though in general I agree that in C2S context such signalling makes little sense. And outside of C2S I prefer vendor-specific endpoints.

Thanks y’all, good replies here. Just to add to @thisismissem’s point about not letting the Mastodon API get more ingrained, I agree. But I also know in practice there are some popular client apps (I’m thinking of iOS apps like Ivory, Mammoth, and Ice Cubes) that will likely be using the Mastodon API for many years to come. I’m okay working within the Mastodon API until we see more traction with a more standards-based API.

To @trwnh, your inboxCapabilities example looks very similar to the proposal in this FEP. Is the main objection that we’re using NodeInfo? If so, let’s move it somewhere else, like the server actor, or a special endpoint. I think we can lift the basic fields from this FEP and put them almost anywhere.

1 Like

I pretty much copied inboxCapabilities from your FEP actually, so that’s why it looks similar :laughing:

The main objection is that the concept of “server support” is not well-defined in ActivityPub; as nightpool points out above, clients are expected to do most of the work here in interpreting activities, with the server only being responsible for the standardized side-effects documented in the ActivityPub spec (such as managing followers/likes/shares and their associated collections).

The reframing suggested is to signal that a) this actor is generally not using C2S, and b) whatever external mechanism is processing this actor’s inbox expects activities to take one of a set of predefined shapes.

So instead of trying to make this “server-wide”, it should be scoped only to a single actor and their inbox.

Again, one can imagine a situation where “special purpose” actors (typically pre-programmed bots) exist on the same server as more traditional fully-fledged C2S actors. This is possible because there is no mandated mapping between actors and users. You might have an actor that exists only to send and receive reports, for example; it might be presented something like this:

id: <reportingActor>
type: Service
inboxCapabilities:
  - Flag Object

The other critique I have for this FEP is that it’s probably not a good idea to separate activities / objects / properties. You would most likely want to have information about the shape of the activity rather than implying that any combination of activities and objects is supported. Taking the example from the FEP, what would the support be for Follow Image? Since Follow is present in the activities set and Image is present in the objects set, one could reasonably assume that Follow Image is supported when it might not be. I’m not entirely familiar with best practices on expressing “shapes” of activities, but off the top of my head I would think to look into SHACL or something similar. So you might end up with something like this after translation into SHACL:

// ...
"inboxCapabilities": [
  {
    "@context": {
      "as": "https://www.w3.org/ns/activitystreams#",
      "shacl": "http://www.w3.org/ns/shacl#"
    },
    "@type": "shacl:NodeShape",
    "shacl:description": "Flag activities have at least 1 object that is an AS2 Object.",
    "shacl:targetClass": {"@id": "as:Flag"},
    "shacl:property": {
      "shalc:path": {"@id": "as:object"},
      "shacl:minCount": 1,
      "shacl:class": {"@id": "as:Object"}
    }
  },
  // more shapes for whatever is supported
],
// ...

For practical purposes it might make sense to have inboxCapabilities point not to a regular JSON array, but instead point to a SHACL shapes graph (that can be loaded from its own IRI):

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "inboxCapabilities": {
        "@id": "https://w3id.org/fep/eb22/inboxCapabilities",
        "@type": "@id",
        "@container": "@graph"
      }
    }
  ]
  "inbox": "https://example.com/inbox"
  "inboxCapabilities": "https://example.com/what-this-actor-supports-receiving-as-a-shacl-shapes-graph"
}
{
  "@context": {
    "as": "https://www.w3.org/ns/activitystreams#",
    "sh": "http://www.w3.org/ns/shacl#"
  },
  "@id": "https://example.com/what-this-actor-supports-receiving-as-a-shacl-shapes-graph",
  "@graph": [
    // include your shape graph objects here
  ]
}

The rest is left as an exercise to the reader. :stuck_out_tongue:

https://shacl-playground.zazuko.com/ might be helpful here, as would SHACL

Thanks @trwnh. The case of something confusing like “follow” + “image” is interesting. I guess I’m not worried about that because it’s not a thing that currently exists in any fediverse software that I’m aware of. I can image other combinations that don’t really make sense either, like “accept” + “video”, even if they are technically allowed by the spec.

SHACL has not been on my radar but I’ll review it. To be honest, I’m very skeptical of introducing anything new with JSON-LD. I think basic JSON like the examples in this FEP are cleaner and easier for a variety of software to deal with.

I understand your skepticism, but SHACL has the advantage of being well-formed and unambiguous while in my opinion it is not that hard to parse. It’s worth a little bit of LD for that. The one potential stumbling point is for LD-unaware consumers to deal with the fact that any extension term can have infinitely many shortnames, but this is true for any extension to AS2. There are some best practices being floated in fep/fep/e229/fep-e229.md at main - fediverse/fep - Codeberg.org regarding making this less painful, the simplest of which is to produce your documents such that any extension is in JSON-LD expanded form, like so:

  {
    "@type": "http://www.w3.org/ns/shacl#NodeShape",
    "http://www.w3.org/ns/shacl#description": "Flag activities have at least 1 object that is an AS2 Object.",
    "http://www.w3.org/ns/shacl#targetClass": {"@id": "https://www.w3.org/ns/activitystreams#Flag"},
    "http://www.w3.org/ns/shacl#property": {
      "http://www.w3.org/ns/shacl#path": {"@id": "https://www.w3.org/ns/activitystreams#object"},
      "http://www.w3.org/ns/shacl#minCount": 1,
      "http://www.w3.org/ns/shacl#class": {"@id": "https://www.w3.org/ns/activitystreams#Object"}
    }
  },

This is in fact the purpose of JSON-LD expanded form, to be unambiguous. LD-aware consumers can compact or flatten against whatever @context they understand locally, while LD-unaware consumers can parse the full IRIs which are constants that do not change.

In practice though, software like Mastodon wrongly expects all of its peers to share the same @context term mappings and shorthand names. This leads to fragile behavior, but it can still be supported via fep/fep/888d/fep-888d.md at main - fediverse/fep - Codeberg.org if you wish:

{
  "@context": ["https://www.w3.org/ns/activitystreams", "https://w3id.org/fep/xxxx"],
  "type": "NodeShape",
  "description": "Flag activities have at least 1 object that is an AS2 Object.",
  "targetClass": "Flag",
  "property": {
    "path": "object",
    "minCount": 1,
    "class": "Object"
  }
}

Where the context document looks something like this:

{
  "@context": {
    "sh": "http://www.w3.org/ns/shacl#",
    "NodeShape": "sh:NodeShape",
    "description": "sh:description",
    "targetClass": {
      "@id": "sh:targetClass",
      "@type": "@vocab"
    },
    "property": {
      "@id": "sh:property",
      "@type": "@id"
    },
    "path": {
      "@id": "sh:path",
      "@type": "@vocab"
    },
    "minCount": "sh:minCount",
    "class": {
      "@id": "sh:class",
      "@type": "@vocab"
    },
    // and so on -- SHACL doesn't seem to have its own context document hosted anywhere...
  }
}

This does open up the potential for term name conflicts if a term name / shorthand is used in multiple contexts, but again, that’s the problem inherent to assuming everyone shares the same @context as you.

Even if you end up not wanting to use SHACL, it would still be good practice to define a context document for whatever you come up with instead (via FEP-888d). This way, LD-aware consumers don’t get tripped up on your JSON-only properties that will most likely end up being ignored or stripped away.

Related to this discussion, I just found out that Mastodon 4.3 has a new “api_versions” field from /v1/instance. It uses the server name, so if I understand correctly it could look something like:

"api_versions": {
  "mastodon": 2,
  "microdotblog": 1
}

The problem is then each client would effectively need to be hardcoded for lots of potential server types. Seems difficult to keep track of, compared to the proposal here to describe what features are supported.