Desired changes for a future revision of ActivityPub and ActivityStreams

aschrijver · September 23, 2024, 6:58am

Yes, the message exchange is undefined at protocol level. I want AS/AP to be a protocol for the exchange of Activities and having a payload of Objects. On the wire these are JSON messages. If you have different services (actor endpoints) that interoperate together, then they’ll do more than just CRUD. They’ll be engaging in more intricate processes together, which boil down to particular expected message exchanges.

As for msg exchange… think eCommerce processes, which are usually taken as example. If I am a Customer that does Checkout{Basket} at a Webshop, then the webshop may send a subsequent validation to my Bank. If payment succeeds shipping starts, and customer can inform progress from the Delivery service.

trwnh · September 23, 2024, 7:14am

sorry, this is what i meant:

is this document json?

{
  "id": "https://example.com/some-resource",
  "type": "Object",
  "attributedTo": "https://example.com/someone"
}

most people would say yes.

now for that same document: is it jsonld? or more correctly: could it be?

well, if you have two systems both reading this document, how do we know what attributedTo means?

it’s a valid AS2 document, so we could assume it to be AS2. we don’t know for sure, but it could be.

some other consumer could also assume it to be AS2.

so now, for the two of us, this document is AS2. we assume that attributedTo means as it is defined in the Activity Vocabulary.

so here’s the trick: we never actually declared a @context. we never once entered jsonld-land. we just assumed it to be AS2.

but importantly, we assumed the same thing.

we have a shared context. an implicit one, to be sure, but a shared context nonetheless: that attributedTo specifically means the concept defined by AS2-Vocab.

again, jsonld is irrelevant, we are referring only to abstract concepts.

ok, now we introduce just the tiniest bit of ld concept: let’s say that we use the iri https://www.w3.org/ns/activitystreams#attributedTo to refer to the concept of “attributedTo as defined by AS2-Vocab”. so far we should still be on the same page. we haven’t mutated the document or anything, it’s still “plain JSON”, there still is no @context declaration.

now here’s the point i’m trying to make: would it be wrong to say that we both share this implicit jsonld context?

"id": "@id",
"type": "@type",
"attributedTo": {
  "@id": "https://www.w3.org/ns/activitystreams#attributedTo",
  "@type": "@id"
},
"Object": "https://www.w3.org/ns/activitystreams#Object"

again, without actually injecting this context. the document still looks like this:

{
  "id": "https://example.com/some-resource",
  "type": "Object",
  "attributedTo": "https://example.com/someone"
}

is it jsonld or not?

my point is that even though no @context is declared, we both share the same implicit context, in our heads. we have the same understanding of the document. probably by both of us reading the AS2 spec, right?

so, what does injecting the @context actually get us, then? why bother jumping into jsonld-land?

it allows machines to be able to unambiguously know, just like we might assume in our heads, that “attributedTo is attributedTo as defined by AS2-Vocab”.

maybe that machine processing required to get to that knowledge is deemed “not necessary” or “too complex”, but that’s the option that we have. there are, of course, other options.

(this thought experiment can be extended to any arbitrary or proprietary or vendor-specific JSON API, simply by assigning an IRI to every property key or type or id. in other words, for an arbitrary JSON document, chances are good you can come up with some context document that would allow converting it into JSON-LD. this context document would be developed by and represent the knowledge you obtain from reading the API documentation.)

SorteKanin · September 23, 2024, 7:26am

I don’t think I want to ignore extensions entirely, so JSON-LD compliance may be required eventually. This worries me. But also JSON-LD just makes the whole spec difficult to follow and more confusing than I think it should be.

This sounds kind of judgemental, like “well you can’t have extensibility but you chose to inflict this problem upon yourself so it’s your own fault. You accepted plain JSON, so deal with it”. I don’t feel that it is my fault that the specification has chosen an exotic data format such as JSON-LD and made it confusing as to whether or not I even need to use it.

I don’t think it is the fault of the developers when they choose not to support JSON-LD in most AP implementations. The fault lies with AP for choosing to use JSON-LD, when JSON-LD is not well supported or well liked by developers.

I honestly don’t think standards can be made machine readable in the sense you describe here, as I’ve already explained above. Or, any such machine readable stuff is not useful for building an actual usable user interface, as explained above. Standards are human documents, as long as we don’t have artificial general intelligence.

I don’t know exactly how Mastodon or Pixelfed works but Lemmy could definitely not be a client. As far as I understand, the C2S mechanism in AP precludes anything but a simple chronological timeline. So it is impossible to build Reddit-like sorting or Twitter-like algorithmic recommendation in a C2S AP client (or at least, I believe my phone would run out of storage and battery very quickly using such a client and even then I am not sure).

I believe this is one of the biggest problems with the C2S idea - it simply excludes a lot of possible social media forms.

I don’t think normal users will ever want to use this, especially not point 2 (which sounds very complicated). Hell, most users don’t even use email clients and just use their email provider’s website in their browser as an email client, much like they would use Mastodon as an AP client. In that sense, this comparison to email kinda falls flat.

As I said above, there is, because you definitely can’t make a reddit-like experience in a client alone. C2S will not save us.

I simply disagree, it is not simple. Throwing out JSON-LD would simplify things. Some of the points I mentioned at first would also help.

As flattering as this is, I unfortunately think that it is just as likely that the people continuing this thread are not the intelligent, but merely the stubborn, patient people with too much time on their hands (including myself among those at the moment obviously ). Which is sad, because we really need more of the people that are actually building stuff to weigh in on these things instead of the people who are just discussing how to build stuff. But they are busy building stuff instead of discussing how to build stuff.

That’s part of why I think we should get rid of JSON-LD. The people that are building stuff have clearly rejected it. They don’t have time to come in here and argue against it, but we can at least look at what they are building and infer some feedback from that.

If the solution to that problem is JSON-LD, then it is clearly not solved, because JSON-LD is poorly supported and clearly nobody wants to implement it.

Also, we don’t need any namespacing as I’ve made clear before. Just use UUID keys! Plain JSON, simple to implement, no coordination between actors required (not even by social means), zero chance of collisions. You don’t need namespacing if the root namespace has 2^{128} equivalent possibilities! This seems to me a much simpler way to solve this problem.

Now this I can get behind. While I don’t think such a server would be terribly useful for anything else than C2S (and again, that precludes many kinds of social media), a reference implementation is far more specific than a specification (for better or worse).

I think this comes back to the “if a feature is not used by users, it is bad” thing. If nobody is implementing the spec as intended, then clearly the spec doesn’t fulfill what people actually want or need.

JSON-LD is an insanely complicated fashion to arrive at that conclusion, if you ask me. Just use a name that is more unique than attributedTo, like activityStreamsAttributedTo or even 0e122c77-fdc5-47a9-b115-29cf38f2e98e. Done, now you also know unambiguously that you are referring to the same thing.

I personally say, not necessary and too complex and clearly most implementations out there currently agree. If we can just use plain JSON, I don’t see why we would continue to think of these “contexts” and “vocabularies” in JSON-LD terms.

stevebate · September 23, 2024, 7:55am

Hmm, I think someone mentioned JSON Schema as something to consider (even in the text you quoted). I’m curious why you didn’t include it in the possibilities?

Another option, which I’m not currently recommending, is to stop pretending AP is not JSON-LD/RDF, embrace it, understand it, and revive the related machine-readable (OWL) ontology (or create a new one). That opens up SHACL and ShEx as possible tools , instead of JSON Schema, for defining extensions with structural and value constraints (which JSON-LD is not intended to do).

The Extension Policy relies on JSON-LD, so I’d include it in that category.

Here’s a discussion related to using URIs for JSON property names to support extensibility (from 13 years ago)…

Thinking about Namespaces in JSON (mnot.net)

stevebate · September 23, 2024, 8:08am

In an AS2 context, it depends on whether you received it with an AS2 media type (per the Rec). If not, then it’s JSON and not AS2 (and the server could legitimately drop it as an invalid message, although they might decide not to for other reasons). If it was received with an AS2 media type, then you MUST process it, as if, the AS2 context is there (making it effectively JSON-LD).

I also never claimed implicit contexts don’t exist, but that there is not an implicit JSON-LD context in every JSON document. Most JSON documents have no implicit JSON-LD context and claiming they do will lead to more confusion, in my opinion.

In any case, you seem to be trying to make the argument that the distinction between JSON and JSON-LD doesn’t matter. If so, then it should be a very easy to drop JSON-LD from AP 2.0.

stevebate · September 23, 2024, 8:25am

I feel like this is a very unfair oversimplification. When ActivityPub advocates point to the great success of the protocol, they refer to a “Fediverse” consisting of mostly Mastodon servers. For developers who are interested in interoperating with open, federated messaging apps (and not just microblogging or “twitter clones”), it reasonable to consider Mastodon as “pragmatic ActivityPub” in practice (versus theoretical ActivityPub).

I don’t know of any pure ActivityPub software that’s fully compliant with the spec. If you do, let me know because I’m sincerely curious what that might look like.

nutomic · September 23, 2024, 1:34pm

Based on my experience developing Lemmy, there is absolutely zero benefit to JSON-LD. So Im all for switching to simple JSON which is widely understood and supported.

silverpill · September 23, 2024, 2:10pm

Good example! Because I don’t really want Context to mean “conversation container” and would prefer it to be defined separately (e.g. in FEP-7888). Perhaps even as “this collection Adds objects that share the same context”. Implementers of threaded conversations have an incentive to reach consensus on property and type names, so I think “social means” will work quite well in this case.

Sounds good to me. Shared context is already assumed in today’s Fediverse. If someone decides to use same properties but assign a different meaning to them, they would be better off doing that in a separate network.

trwnh · September 23, 2024, 3:39pm

I said “other possibilities”, so I thought it was implied that the bit I was quoting would also be included in the total list of possibilities.

Perhaps. It was argument ad abstractum. The “context” is just the shared understanding between any two implementations. Whereas the JSON-LD @context maps terms to IRIs (and RDF maps IRIs to concepts), the implicit context maps terms to concepts (directly, without the IRI middleman). This is what is described in the JSON-LD spec’s section on “The Context”: JSON-LD 1.1

When two people communicate with one another, the conversation takes place in a shared environment, typically called “the context of the conversation”. This shared context allows the individuals to use shortcut terms, like the first name of a mutual friend, to communicate more quickly but without losing accuracy. A context in JSON-LD works in the same way. It allows two applications to use shortcut terms to communicate with one another

I was extending the argument to say that you could make this implicit context explicit, by coming up with some context document that encodes the mutual understanding of each concept as an IRI.

So for example, the Mastodon API response for my own account on mastodon.social:

{
"id": "14715",
"username": "trwnh",
"acct": "trwnh",
"display_name": "infinite love ⴳ",
// ...
}

This has an implicit context based on the Mastodon API docs and codebase, no?

{"@context": {
  "id": "https://docs.joinmastodon.org/entities/Account/#id",
  "username": "https://docs.joinmastodon.org/entities/Account/#username",
  "acct": "https://docs.joinmastodon.org/entities/Account/#acct",
  "display_name": "https://docs.joinmastodon.org/entities/Account/#display_name", 
  // ...
}}

As in, conceptually, each term maps to the concept documented at some IRI. You could go one step further and write an ontology that defined each term in machine-readable format, linking it to equivalent concepts via owl:equivalentProperty (for example, it wouldn’t be far-fetched to say that <https://docs.joinmastodon.org/entities/Account/#display_name> owl:equivalentProperty as:name ., would it?)

I’m not saying anyone has to go further or that anyone should be required to go further, but I’m saying that they could, if they wanted to, if they considered it worth the efforts and tradeoffs. And this can generally be done for any JSON document. All it takes to turn JSON into JSON-LD is to inject the implicit context as an explicit @context declaration. (Bear in mind that anything not representable as RDF can still be represented as literal JSON (rdf:JSON) using the new @json keyword in JSON-LD 1.1.)

If this is confusing to anyone, then they can pretty much just ignore it.

People can make that argument if they want. My argument is that “the distinction between JSON and JSON-LD” is purely in whether the context is implicit (shared in our heads) or explicit (declared in the document). It’s not like they’re completely different things. Logically speaking, “JSON-LD is JSON with declared context” has the inverse statement “JSON is JSON-LD without declared context”.

This whole line of conversation basically boils down to: do you think context should be implicitly shared or explicitly declared?

If you think it should be implicitly shared, then you are likely to favor “plain JSON”, and believe that there is a “network” or “protocol” that consists of all implementers who share that implicit context (or several that overlap partially).
If you think it should be explicitly declared, then you are likely to favor JSON-LD, and believe that there is not a “network” or “protocol” but rather an open-world system of completely unrelated actors and systems.

As silverpill says:

This is the line of reasoning of the former bullet point – that there is a “Fediverse network” with shared context. The alternative would be to follow the AS2 spec recommendation that you MUST support at minimum iri expansion, which enables you to handle documents produced by systems that are not purely AS2/AP.

stevebate · September 23, 2024, 4:13pm

You seem to be redefining “context” on-the-fly, so I’m not sure I can follow your argument accurately. Do you mean JSON-LD context or some abstract philosophical context of communication context? I don’t know.

My personal preference is JSON-LD with explicit JSON-LD context with supporting metadata like an OWL Ontology and a schema definition of some kind (SHACL, ShEx). For 99.99% of the AP developers, I prefer JSON with explicit schema definitions (you might call it “context”, I’m not sure).

trwnh · September 23, 2024, 5:04pm

Not my intent, sorry. I’m only describing the tradeoff from your options here:

Stick to AS2/AP properties only (JSON-LD is completely irrelevant)
Partially support extensions (hardcoded, implicitly shared context based on ahead-of-time communication)
Fully support extensions (support IRI expansion at least, not necessarily full JSON-LD, but just the bare minimum to be able to go from term to https://ns.example/term)

Going with the first option and continuing to worry about JSON-LD is what doesn’t make sense to me. Going with the second option and worrying about JSON-LD makes slightly more sense, because there is a chance you might encounter a document whose context you don’t implicitly share. Going with the third option makes a bit more sense, because while you don’t have to deal with the entirety of JSON-LD you do have to deal with the bare minimum

I’ve already given examples as to how you can dynamically load schema information into a user interface (by showing a tooltip on hover, for example), but I’ll refrain from re-treading this particular line of argument. Suffice to say, these mechanisms are already proven very useful in life sciences and open data, which you might not care about, but it’s clear that you don’t need AGI to parse a schema or ontology. Just some basic logical inferencing. This allows machines to read specs, too.

“Lemmy as an AP client” would work like this:

You go to a Lemmy server
Lemmy lets you browse different sublemmies (acting as a mini Web browser)
You want to post, so you authorize against your AP server
Lemmy lets you notify your followers whenever you make a post (via AP C2S POST to outbox)

This isn’t running on your phone. The Lemmy instance would be an AP client. It could re-expose whatever client API it wanted to, for your phone-based browsing needs (or otherwise just a mobile website). In concrete terms, Lemmy as an actor is doing the aggregation and sorting work for you. C2S isn’t excluding any possibilities here – it’s just doing what it was designed for, which is to send activity notifications to your followers (or anyone else, really).

WordPress powers 43.5% of all sites, so I think it’s not too complicated to understand for the average user.

Also, a webmail interface is just a type of client. “Much like they would use Mastodon as an AP client” is kind of the point here – that Mastodon is acting as an AP client and as a mini Web browser. At no point is the user required to interface directly with ActivityPub – they just log in wherever they need to log in.

How?

ActivityPub can be summarized entirely like so: “POST an activity to your outbox, and it will have some side effects, then it will be POSTed to every addressee’s inbox, where it might have some more side effects.” That’s it. That’s basically the whole spec.

Again, the complexity is not in the basic mechanism of ActivityPub, and it’s not entirely in JSON-LD either. It’s everything else. If you want to build a “network”, ActivityPub is not enough. You need to specify payloads and what you’re going to do with those payloads. You need to implement other specs like WebFinger and HTTP Signatures, you need rules for parsing and sanitizing HTML content, and you need to spend countless hours debugging the possibly undocumented behaviors of every other implementation you want to be compatible with. Even without JSON-LD, you have to account for all manner of edge cases you might not think about, like dealing with properties that can be single items or possibly arrays, like dealing with multiple values where you were expecting only a single one, or dealing with a singular value where you were expecting an array. You have HTTP headers and HTTP status codes. You have multiple different ways to express the same thing. You have to deal with fetching, caching, cache expiry, keys, key rotation, task queues, task failures, redirects, and many more things. You are building a Web browser even if you don’t realize that you are building a Web browser. You are likely also building an embedded mail server at the same time, because no one has really built a standalone mail server yet.

I think you’re confusing HTTPS URIs with JSON-LD.

I think this comes across as far more judgemental than anything I’ve said in the thread above. Granted, I’m being patient here and I’m spending more of my time on this conversation than I’d like. But the implication that this is “just discussing how to build stuff” as opposed to “actually building stuff” is something I find somewhat insulting or devaluing. It implies that designers and architects are worth less than programmers and coders. There are plenty of ways to contribute toward a goal or project, and also on a personal level, I’ve done a lot more over the past years than “just discussing”. I’ve written documentation for and provided design input for multiple fediverse projects. And even the “discussing” directly goes into and toward developing (and writing!) specifications, protocols, proposals, reports, etc. that can then be implemented by various fediverse projects. We’re not “just discussing”. We actually are building stuff. It just happens to not necessarily always be code.

trwnh · September 23, 2024, 5:12pm

When I say “context”, I generally mean the abstract concept of shared understanding. When I say @context or “JSON-LD context”, then I mean the explicit declaration. Does that make the argument clearer?

Nah, a schema is a schema. “Context” is the mapping between symbols and concepts. Like when I say Steve to refer to you, that depends on the shared context between us that you’re the only Steve in this conversation. I could instead refer to you as https://www.stevebate.net/ (I assume this is you?), or I could refer to you as https://trwnh.com/people-i-know/socialhub-participants/stevebate, or I could refer to you as Steve Bate, who has contributed at least once to Vocata. All of these symbols are referring to the same individual, yourself.

jenniferplusplus · September 23, 2024, 6:20pm

I don’t think there’s likely to ever be a meaningful version 2 of the AP spec. There’s no mechanism to negotiate it, and no institutional support to collect the implementers and do the work. The best I can imagine is a separate protocol that exists as a successor to activitypub.

That said, what would I want in such a successor?

Remove ld/rdf as a federation format. It’s awful at that. Use something with real schemas that support code generation. RDF can be supported as a query format, if desired.
A protocol-defined federated authentication mechanism.
Absolutely no self-authenticating messages. In fact, no forwardable messages of any kind. It’s an unbelievable security risk, and it’s really distressing to me that I have to keep explaining that.
Specifications for how to reply to an object, how to comment on it, how to locate replies and comments, how to approve and reject those replies and comments. Specifications to determine who has authorization to reply/comment/accept/reject, and how to convey that authorization to third parties. Specifications to convey error information.
A real extension mechanism. LD/RDF handwaved that without actually solving anything. I hope it would be obvious that this is necessary with that aspect of the spec removed.
Specifications for managing distributed ownership of shared resources. Particularly groups, but I expect more uses would flourish if it wasn’t so nearly impossible to do at all.
Distinguish between persistent messages and ephemeral ones like presence and status updates.

I’m sure I could go on, but this seems like plenty for what seems like a theoretical exercise.

SorteKanin · September 23, 2024, 6:30pm

I don’t really understand how this works in practice - or rather maybe, how is this different from how Lemmy works right now? Is it only that you seem to be separating the AP server from the Lemmy “client”?

I would say the people making WordPress sites are already above-average when it comes to technical literacy. In general technical people (everyone here) vastly overestimate other people’s technical skill and literacy.

The spec would already be far easier to understand for the average developer if it used plain JSON. It would also be far easier to implement as you wouldn’t need to worry about any of the JSON-LD particularities and could rely instead on plain JSON libraries, of which there are plenty.

I’m not sure I understand, what does what I said have to do with HTTPS URIs? What I would propose instead of JSON-LD is simply plain JSON with (for instance) UUID keys to ensure anyone can come up with their own new keys in a decentralized manner without worrying about conflicting with other keys.

Sorry, I certainly didn’t mean to judge anyone’s intelligence or worth - it was more to say that anyone who does not reach a certain threshold of stubbornness/patience or available time are simply pushed out of these discussions by their sheer length. It’s a shame but I’m not sure how to make it more accessible. Tbh I think the linear flow of this forum as opposed to a tree of comments makes it seem more daunting (this is part of why I prefer reddit-like comment trees).

On building vs discussing, I am more inclined to an “agile” workflow. It is very difficult if not downright impossible to discuss and design our way to a good solution to any software problem ahead-of-time. Continuous feedback and iteration between the theoretical debate and the practical building is the better approach I think.

The feedback right now from the practical building side is that nobody is using JSON-LD and it doesn’t seem like many even want to. We need to take that feedback seriously. Standing fast on JSON-LD and contexts and ontologies and other (for the average developer, myself included) quite foreign concepts is not a feasible path forward.

SorteKanin · September 23, 2024, 6:35pm

I like all your other suggestions but this one leaves me puzzled. Sorry if this is distressing you further but could you maybe elaborate on how/why it is a security risk? Or link to where you have previously explained it, if you prefer

trwnh · September 23, 2024, 7:00pm

What’s wrong with plain JSON but with URI keys? You can use HTTP(S) URIs as keys without it being “LD”. I see no reason to insist on UUIDs.

So, are we just going to ignore the “everything else” I pointed out?

trwnh:

Again, the complexity is not in the basic mechanism of ActivityPub, and it’s not entirely in JSON-LD either. It’s everything else. If you want to build a “network”, ActivityPub is not enough. You need to specify payloads and what you’re going to do with those payloads. You need to implement other specs like WebFinger and HTTP Signatures, you need rules for parsing and sanitizing HTML content, and you need to spend countless hours debugging the possibly undocumented behaviors of every other implementation you want to be compatible with. Even without JSON-LD, you have to account for all manner of edge cases you might not think about, like dealing with properties that can be single items or possibly arrays, like dealing with multiple values where you were expecting only a single one, or dealing with a singular value where you were expecting an array. You have HTTP headers and HTTP status codes. You have multiple different ways to express the same thing. You have to deal with fetching, caching, cache expiry, keys, key rotation, task queues, task failures, redirects, and many more things. You are building a Web browser even if you don’t realize that you are building a Web browser. You are likely also building an embedded mail server at the same time, because no one has really built a standalone mail server yet.

SorteKanin · September 23, 2024, 7:06pm

I don’t disagree that those things also bring complexity but that doesn’t mean removing JSON-LD won’t make the system simpler. As I said above, I am sure there are other places that can also be simplified but personally and from what I hear, JSON-LD is one of the major pain points.

Nothing really, but I also just don’t really see the benefit, as explained earlier. UUIDs is just a nice simple thing where you don’t even need to think about what keys you choose since they are entirely arbitrary (I should specify I mean v4 UUIDs).

jenniferplusplus · September 23, 2024, 7:58pm

This just crossed my timeline, and it pretty well sums up my concern.

Self authenticating messages, and message forwarding, completely subvert blocking. If we can’t even attempt to block people from having access to us and our posts, then everything else we’re doing here is for nothing. We might as well just use RSS feeds.

SorteKanin · September 23, 2024, 8:23pm

Ah, I see the concern - so when you said “security risk” you meant like the psychological safety of the users and not like a security breach or security problem with the authentication mechanism itself (if I understand correctly).

But how does it help to not have self-authentication? I mean a malicious actor can still forward messages, even if they can’t prove that the message really is from you, but people getting illicit forwarded messages from malicious actors likely won’t care about that. Or is the plausible deniability of such forwarded messages enough to allay your concern? I guess this also only applies to non-public messages, as public messages are easy to get for everyone.

What would this look like? Is this the idea of being able to log into any instance or…?

Just out of curiosity due to the wording here, do you consider replies and comments distinct? How are they different if so?

jenniferplusplus · September 23, 2024, 8:52pm

Ah, I see the concern - so when you said “security risk” you meant like the psychological safety of the users and not like a security breach or security problem with the authentication mechanism itself (if I understand correctly).

No, I mean security risk like it will get people harassed, scammed, doxxed, stalked, and worse. Disclosing messages to people other than the designated recipient is a security breach. The protocol can’t stop people from screenshotting private messages. But it can stop intermediary servers from automatically forwarding those messages, and it can stop 3rd parties from being able to automatically validate them.

This is fundamentally about being able to know and control to whom you are speaking. It’s table stakes for a communication system.