Practices around JSON formatting of JSON-LD messages

ActivityPub is primarily a Linked Data standard. It offers an option for developers to treat it as plain JSON for convenient processing by web technologies.

I wonder about the common practice to @context mapping of namespaces to avoid use of namespace prefix in the body of the JSON message. Aren’t we making things too convenient?

The problem is that ActivityStreams is a much overused and overloaded vocabulary already, where developers try (and are even encouraged) to map their own application and business domains and features to the social primitives that AS offers. And then make extension properties look like they are part of AS too.

Isn’t this a better practice:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "toot": "http://joinmastodon.org/ns#",
      "foobar": "https://example.org/ns/foobar",
    }
  ],
  "type": "Actor",
  "toot:discoverable": true,
  "foobar:discoveryFoo": "bar",
  ...
}

Than this, the common practice for JSON convenience:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "toot": "http://joinmastodon.org/ns#",
      "discoverable": "toot:discoverable",
      "foobar": "https://example.org/ns/foobar",
      "discoveryFoo": "foobar:discoveryFoo",
    }
  ],
  "type": "Actor",
  "discoverable": true,
  "discoveryFoo": "bar",
  ...
}

I’ve seen developers be confused (perhaps because they don’t parse or look-up the context) and say things like "yes, discoverable is part of Actor and can be used as a general consent mechanism. Which is incorrect as discoverable is app-specific for Mastodon, and even where it has become a de-facto standard it still relates to discovery of microblogging accounts.

Generally the way extension happens, mainly through post-facto interoperability, is not the way it is intended. It is done haphazardly without following good design rules.

Today we have relatively few apps and services, and only a handful of well-adopted namespaces ‘do the rounds’, like Mastodon’s app-specific-but-de-facto-standard toot: and more recently the improvement to refer to fep: namespaces.

What about the future? What if AP-based decentralized social networking becomes ubiquitous? 1,000’s of apps and services, numerous extensions and namespaces thereof.

It is primarily a JSON-based standard, that offers an option to use JSON-LD. For two reasons:

  1. This is how an overwhelming majority of developers use it.
  2. The spec itself states in section 3. Objects that @context is recommended, not required.

Are you concerned about name conflicts? Today it is not a problem, but we’ll start using better names once this becomes a problem. I think namespacing is a great idea.

Ironically, JSON-LD may be the thing that stands in the way of adopting namespaces:

The serialized JSON form of an Activity Streams 2.0 document MUST be consistent with what would be produced by the standard JSON-LD 1.0 Processing Algorithms and API Compaction Algorithm using, at least, the normative JSON-LD @context definition provided here […]

https://www.w3.org/TR/activitystreams-core/#jsonld

I’ve heard that compacting removes namespace prefixes.

1 Like

From whence cometh meaning?

  • JSON alone does not carry any semantics for any terms used in a document.
    • Semantics with JSON-LD are usually provided by term definitions provided via some JSON-LD context.
      • Terms can be anything, but a best practice / recommendation is to have terms which dereference to the term definition.
        • http(s): identifiers are often used as terms because the HTTP(S) protocol provides a convenient default agreed-upon way to obtain term definitions, if you follow the best practice of hosting term definitions at those canonical origins.
          • The URI Definition Discovery Protocol, or UDDP, codifies how to obtain term definitions via HTTP(S) for terms that use http(s): identifiers, relying on HTTP return codes, HTTP resource body content, and HTTP Link headers.
          • However, note that while http(s): terms can have a “canonical” term definition if following the UDDP, it is still possible to load your own term definitions out-of-band.
        • Terms that aren’t a “URI with a canonical dereferencing algorithm” must load term definitions out of band, because UDDP requires a URI to be canonically dereferenced to its URI definition.
          • Because these terms don’t have a canonical definition, peers must agree to load the same term definitions into their processors. So terms need a consensus algorithm to obtain their definitions (that isn’t UDDP).
    • Ultimately, semantics in general are provided by mutual agreement.
      • Mutual agreement in Linked Data relies on UDDP to obtain canonical term definitions for a URI, nominally via HTTP(S) protocol.
      • Mutual agreement outside of Linked Data is usually signaled with IANA media types.
        • application/activity+json is intended to carry the same semantics as "@context": "https://www.w3.org/ns/activitystreams".
          • A processor encountering an AS2 Content-Type can inject the normative AS2 context always, as the last declared context (so that it is not overridden).
            • Since the normative AS2 context is required and is functionally always in effect, JSON-LD processors should arrive at the same semantics as those defined by the AS2 specs.
          • Processors not using JSON-LD should arrive at the same semantics as those defined by the AS2 specs, but they do so by manually reading the spec and manually hardcoding the semantics into their processors.
            • This semantic encoding process results in terms defined by AS2 having known meanings, extracted by whoever read the spec and hardcoded those meanings into the processor.
              • Note that the spec may not be interpreted perfectly or in the same way between peers, which causes problems.
            • This semantic encoding process also results in terms NOT defined by AS2 having no known meaning by default.
      • Thus, peers and processors need a way to arrive at mutually-agreed-upon term definitions for any terms not defined by AS2.
        • AS2 recommends that you SHOULD use JSON-LD to define these “extension” terms.
          • Processors using Linked Data should arrive at a canonical agreed-upon meaning if the terms are defined correctly using UDDP.
          • AS2 allows that you MAY augment the JSON-LD context, but doing this creates an issue for processors that don’t use JSON-LD.
            • By default, naive processors should ignore the "@context" entirely, so terms will not be expanded correctly to their full URIs, and the meaning will be ambiguous.
        • In the absence of JSON-LD definitions for these non-AS2 terms, current fedi implementations just YOLO it and blindly assume that all other peers always agree with the semantics that they hardcoded into their processors.
          • Thus, the de facto consensus algorithm is “just do whatever Mastodon does” or “just do whatever Lemmy does”.
          • Worse, there is no acknowledgement that peers might actually disagree with you.
            • Semantic confusion is therefore basically blindly accepted – whatever breaks is not observable.
            • Semantic attacks are possible by using an expected shorthand which actually expands to something different than what is expected.
        • An alternative to “de facto consensus” is to maintain a central registry of allowed terms and their definitions.
          • This is an idea being attempted by the “AS2 extensions policy”, which in effect removes decentralization from AS2 and forces retroactive updates to any AS2 documents using the normative AS2 context.
        • Another alternative to “de factor consensus” that preserves decentralization is to define profiles which import additional terms and their definitions.
          • For example, a “Mastodon profile” could include all the additional semantics and constraints required by Mastodon processors.
            • Such a profile can also provide its own JSON-LD context for convenience to JSON-LD processors.
            • Terms used by this profile ideally should follow best practices for obtaining canonical term definitions, although a sufficiently constrained profile can be used to derive these term definitions in the same way you’d derive term definitions from AS2 without the JSON-LD context (by reading the spec/profile).

Practical approaches to extensibility

Now, with regards to what you can practically do with AS2 documents and JSON-LD extensions, this was previously discussed in FEP-e229: Best practices for extensibility as well, and I intend to incorporate all this into the next revision of https://w3id.org/fep/e229.

Option 1: Do not include any additional context.

  • JSON processors that ignore "@context" look for canonical, fully expanded identifiers.
  • Object properties cannot be expressed as a JSON string; they must be expressed as JSON objects using "id".
{
  "http://joinmastodon.org/ns#discoverable": true
  "http://joinmastodon.org/ns#featured": {"id": "https://mastodon.example/users/alice/featured"}
}

I would recommend this as the most straightforward way to allow processors to ignore "@context" entirely.

Option 2: Include prefixes only.

  • JSON processors can no longer ignore "@context".
  • Namespaces alone aren’t enough because they need some kind of authority to define terms in that namespace. toot: means nothing on its own and carries no authority, until expanded.
{
  "@context": {
    "toot": "http://joinmastodon.org/ns#"
  },
  "toot:discoverable": true
  "toot:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "mastodon": "http://joinmastodon.org/ns#"
  },
  "mastodon:discoverable": true
  "mastodon:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "foo": "http://joinmastodon.org/ns#"
  },
  "foo:discoverable": true
  "foo:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "toot": {
      "@id": "http://joinmastodon.org/ns#",
      "@prefix": true
    }
  },
  "toot:discoverable": true
  "toot:featured": {"id": "https://mastodon.example/users/alice/featured"}
}

Expanding using prefixes only can be less complex than expanding using complex term definitions, but it doesn’t get you much in return for requiring expansion except maybe making that expansion a bit less complex… but not by much. So I don’t think this is worth it, really.

Option 3: Include embedded context with term definitions.

  • JSON processors cannot safely ignore "@context".
  • No need to fetch remote context documents (although see next section on how this can be avoided anyway).
  • String values may or may not expand to ID references, depending on whether a term is defined as @type: @id or not.
{
  "@context": {
    "discoverable": "http://joinmastodon.org/ns#discoverable",
    "featured": {
      "@id": "http://joinmastodon.org/ns#featured",
      "@type": "@id"
    }
  },
  "discoverable": true
  "featured": "https://mastodon.example/users/alice/featured"
}

I think this actually can make things more complex for anyone not using a JSON-LD processor, and it’s kind of the de facto state of fedi right now, with Mastodon cramming a bunch of term definitions in its embedded context, except those term definitions are actually incorrect in some cases. It might actually make more sense to detect software name and version via something like NodeInfo, then inject a corrected context, which is a wild thing to even suggest. All the worst parts of user-agent sniffing.

Option 4: Include a remote context.

  • JSON processors can’t ignore "@context" entirely, but they can avoid JSON-LD processing if they know ahead-of-time what a context identifier means, just like how they might know what "https://www.w3.org/ns/activitystreams" means ahead-of-time.
    • Best practice is to make context identifiers immutable, so that they don’t have to be dereferenced as remote context documents.
    • The JSON-LD context document can be obtained ahead-of-time and preloaded into a JSON-LD processor. Modern JSON-LD based specs actually require this now, with SHA256 hashes of the context documents provided so you know you got the correct document.
    • The context identifier can also content-negotiate to HTML documentation of the terms, and refer back to any profiles that may be in effect.
    • The goal should be to get the context document to agree with the spec/profile completely.
{
  "@context": "https://joinmastodon.org/contexts/v4.5.0",
  "discoverable": true,
  "featured": "https://mastodon.example/users/alice/featured"
}

Note that this is what more reasonable contexts do, for example security/v1 vs security/v2. This is really the most “idiomatic JSON” approach to JSON-LD contexts because JSON-LD can be truly optional, assuming the terms are defined correctly. The referenced JSON-LD @context values can even be seen as a sort of profiling of the document’s semantics, similar to what might be done with Content-Type profile= parameters or a rel=profile Link, but in the body content instead of in the HTTP headers. Those other profiling mechanisms may still be used, but you can’t expect any peer to use a specific mechanism right now. For JSON-LD processors, if they are aware of the profile out-of-band (via HTTP headers), they can inject the appropriate context. For JSON processors, they could use any of the 3, really.

1 Like

De facto. But the interpretation as JSON-based constitutes protocol decay. ActivityPub is a linked data standard. The question is whether and then under which conditions the protocol decay is acceptible. Subsequently standardizing on that removes the protocol decay, possibly at the cost of tech debt in the installed base that must be reconciled.

Misconception, the most costly thing to exist in any developer ecosystem, can only be taken away by providing crystal-clear clarity in specifications and guidance.

The absolutely splendid reply by @trwnh - which must have cost much time to formulate, so kudos and thanks :folded_hands: - shows there are many nuances and important considerations at play. Solving the eternal plain-JSON vs. Linked data conundrum is a MUST HAVE for a healthy fediverse future.

A properly fleshed out protocol extension mechanism and how to build robust interoperable solutions on top of it are “the killer app”, to talk in these terms, for the fediverse. Even more so than nomadic identity that is often mentioned here instead. In other words:

(First, to hook into the above, I would remark that your post addresses aspects of the extensibility mechanism we need. More is needed to robustly model the service interaction between actors on the social graph.)

A great set of options regarding linked data handling..

For Protosocial AP extension my preference constitutes an Option 5. Remote context (option 4) with a preference for specialized types (with "type" an array), and avoidance of extending existing types with new properties in the aspect-oriented manner we do now (e.g. “existence of these properties makes the Group de facto a LemmyGroup”). Exceptions are generic domain extension, such as the security vocab. Namespace prefixes are used (option 2).

I may follow up with a Social coding blogpost that expands more broadly on protocol extension and Grassroots standards.

There is a concern about namespaces being generally non-idiomatic JSON so the JSON-LD world mostly has a preference to more fully defining terms in a way that describes the intended semantics. JSON-LD context can actually be scoped to certain types or certain properties, but property scoping is preferred over type scoping. (Property scoping can override protected term definitions at more specific points in the document, but type scoping cannot.)

One pain point is that "@context" as a key can occur anywhere in the document, not just on the top-level object; this is something that makes it harder for JSON processors to keep track of which term definitions are currently in effect (as the declaration of new contexts is by definition additive).

{
  "@context": "foo",
  "something": {
    "@context": "bar"
  }
}

You run into potential hard-to-debug issues if bar redefines a @protected term from foo, but even in the base case it is probably better to try and limit context processing to the level of the document instead of to the level of the node. Because in Linked Data we can include additional information about referenced objects, this may get complicated if the referenced object uses a different context than the current document. Reconciling different contexts can be dealt with in the following ways:

Generate a new combined context at the top-level

The serializer or publisher needs to verify that multiple remote contexts don’t conflict with each other. We might say that a Mastodon 4.5 context combines activitystreams, security v1, and some other terms. This is fine because these contexts don’t conflict with each other. If there was a conflict, the conflict resolution would be done by whoever publishes the combined context.

Avoid embedded information about referenced Linked Data resources

Instead of trying to combine different contexts and dealing with potential conflicts, let Linked Data form a natural boundary between documents/resources.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "actor": "https://alice.example/",
  "type": "Like",
  "object": "https://movie.example/"
}
{
  "@context": "http://schema.org",
  "@id": "https://movie.example/",
  "@type": "Movie",
  "name": "Ghostbusters",
  "actor": {
    "@type": "Person",
    "name": "Bill Murray"
  }
}

The use of “actor” in movie.example violates the requirements of AS2 that you MUST NOT override terms defined by AS2, so merging these two documents naively will cause issues:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "actor": "https://alice.example/",
  "type": "Like",
  "object": {
    "@context": "http://schema.org",
    "@id": "https://movie.example/", // this compacts to "id" in AS2
    "@type": "Movie", // this compacts to "type" in AS2
    "name": "Ghostbusters", // AS2 violation
    "actor": { // AS2 violation
      "@type": "Person", // AS2 violation, also this compacts to "type" in AS2
      "name": "Bill Murray" // AS2 violation
    }
  }
}

If you really want to combine the information in these two documents, you need to not violate AS2’s requirements. This means you need to combine the contexts and reconcile conflicts around the term “actor”. Doing this in a way that is easy to understand for everyone relying on JSON processing can be really difficult, but the simplest approach is to only use the implicit AS2 context.

{
  "actor": "https://alice.example/",
  "type": "Like",
  "object": {
    "id": "https://movie.example/",
    "type": "http://schema.org/Movie",
    "http://schema.org/name": "Ghostbusters",
    "http://schema.org/actor": {
      "type": "http://schema.org/Person",
      "http://schema.org/name": "Bill Murray"
    }
  }
}

Alternatively, you can use only top-level contexts as described before, to avoid naive JSON processors having to track context at each node in the document.

{
  "@context": "https://context.example/", // defines Movie, nameOfThing, actorInCreativeWork, Person2
  "actor": "https://alice.example/",
  "type": "Like",
  "object": {
    "id": "https://movie.example/",
    "type": "Movie",
    "nameOfThing": "Ghostbusters",
    "actorInCreativeWork": {
      "type": "Person2",
      "nameOfThing": "Bill Murray"
    }
  }
}

But then your peers need to recognize this new context and what it entails.

I agree with @silverpill. I used to think ActivityPub was “primarily” a Linked Data standard, but after carefully reading the specifications, reviewing the transcripts of standards meetings on the topic, and discussing it with spec authors, I no longer believe that.

If there was decay, I believe it happened during the writing of the specifications, not afterwards. There was significant opposition to using JSON-LD at all. The strange wording in the AS2 Core spec related to JSON-LD was intended to appease that opposition (to avoid removing any mention of it, which was an option being proposed). I believe two reasons JSON-LD wasn’t removed completely is W3C-related politics and the authors couldn’t think of a better way to support extensibility. Although JSON-LD alone doesn’t support extensibility in any significant way, it sounded good at the time (and continues to be used, I believe incorrectly, as a selling point).

That said, I think there are many opportunities to improve the JSON-based processing of ActivityPub data. I’d prefer to see JSON-LD removed altogether and for the community to develop an effective extension process (schema language, maybe documentation standards, profiles , etc.). On the formatting topic there are many issues not related to JSON-LD. For example, it’s an implementation choice to embed referenced objects or to refer to them using their URI. Each developer makes their own choices about what they produce and the (typically very small) subset of valid serializations they’ll accept (usually, deferring to some variant of the Mastodon choices). Each implementation decides what properties are required or not. In many cases, they decide on implementation-specific cardinality constraints for specific properties. Even semantically simple AS2 object serializations can have 100’s of possible valid representations. This is a big source of whack-a-mole development issues and none of it related to JSON-LD per se.

From this perspective, I think efforts to restrict how @context is used is not the best strategy.

1 Like

Wow. After posting the previous reply, I saw this on the Fediverse from Evan:

He’s recommending the activitystrea.ms JSON-LD parser wrapper which describes itself as “not actively maintained”. I have no idea how this is going to help resolve the bigger issues. Interestingly, his onepage.pub server implementation doesn’t appear to use this library or JSON-LD at all. :person_shrugging:

1 Like

Removing JSON-LD altogether would result in centralizing around either Social CG (de jure) or Mastodon (de facto) for that “extension process”. I don’t think this should be done. Activity Streams 2.0 and ActivityPub should be generally usable without adopting the idiosyncracies of the fediverse. (Actually, my hot take is that the fediverse’s approach to distributed publishing generally doesn’t benefit from the activity model at all, but that’s for a different thread.) You can have Mastodon centrally control a “Mastodon profile”, or some other profile with some other governance, but those constraints shouldn’t apply to everyone, especially not if those constraints don’t make sense outside of a Mastodon-like application or would restrict people from making reasonable statements using the Activity Streams 2.0 document format / content type.

One other reason not to remove JSON-LD altogether is that you would need to reinvent a sort of “linked data without Linked Data”, since the data is not all in one document; you need to be able to link to other data somehow. I think there’s a lack of understanding in the fediverse ecosystem around how to properly handle split resources, in large part due to the lack of understanding of web architecture. Think about how many activities have identifiers that don’t dereference over HTTP(S) properly. Think about how additional information is or is not loaded dynamically at runtime, without regard to which document it came from. There are expectations of consistency that aren’t actually valid due to caching and temporal variance. And so on…

This may sound like an appealing narrative but I think that, all things considered, what ended up being produced by the original Social Web WG could have been a lot worse. The problems with the spec right now aren’t really in the JSON-LD parts, they’re in everything else. The JSON-LD requirements are mostly fine, if overly permissive. As you point out, most of the mess comes from inconsistent or insufficiently bounded serialization, which removing JSON-LD would not affect at all. It’s the “plain JSON” representation of AS2 that has the problems. There are no schematic or shape constraints, and I would even say that the concept modeling is incomplete.

It’s less “restrict how "@context" is used” and more “limit the pitfalls that naive processors will encounter”. I think it’s important to recognize that "@context" is descriptive more than it is prescriptive, and that you can take any “plain JSON” document and write a JSON-LD context to describe its semantics. That’s ostensibly what the normative https://www.w3.org/ns/activitystreams.jsonld context document is supposed to do. The problem for extensibility is that it doesn’t describe everything possible in the whole open universe; it only describes what’s in AS2 (with some discrepancy, sure, but those can probably be fixed). “You can assume that terms mean what they mean in AS2” is one of the few guarantees made (although even that is a bit wacky in practice because fedi softwares end up using the AS2 terms not according to the AS2 definitions…)

The open question is, how do you recognize when a term is being used differently than its definition? It’s a question of linguistics first and foremost, because most languages don’t have a single authoritative canonical dictionary (insert joke about the French Academy here). Leave enough things open to interpretation, and all meaning will escape your hands and evolve independently. And secondarily, what do you do when you encounter a term that you don’t know what it means, and also it isn’t defined anywhere you can obtain a definition? Which “dictionary” do you look it up in, if there isn’t a dictionary at all, or if the dictionary is outdated, or if people aren’t using the dictionary definition? You could imagine a world where everyone had their own dictionary and maybe tried to compile “common” dictionaries based on how most people use a term, or whatever consensus exists, a sort of linguistic study in the wild.

Essentially the recommendations I am making above boil down to: “Speak plainly, avoid unnecessary and confusing indirection, be explicit where possible”. It’s more than just "@context", because even if you completely leave out "@context", you still need to recognize when the context changes (logically, while processing a document, that is).

That’s a strawman argument. There are other options. I mentioned at least one in my previous reply.

There’s so much more to Linked Data (RDF) than that. Of course, we know how to link documents, HTML or JSON. Using JSON-LD for “links” without it being based on Linked Data (the predominate plain JSON usage in the Fediverse), is unnecessary.

@stevebate Do you have a link to that proposal / discussion ?

If there was indeed a significant opposition to JSON-LD, I’d like to draw public attention to any historical documents confirming it.

What do you think about JSON schemas?

@helge started the Schemas for the Fediverse project, which I find very promising.

Which options don’t involve a centrally managed registry of accepted terms? The decentralized option is to allow multiple authorities for term definitions. The centralized option is to establish a single authority for term definitions. Which term definitions get to be added to the normative activitystreams context? Should people depend only on this context? That’s what’s being asked here. You have two categories of terms and their meanings:

  • “defined in activitystreams”, which everyone imports from application/activity+json
  • “not defined in activitystreams”, which gets imported from some other source or mechanism

All AS2 says is that you SHOULD provide definitions via JSON-LD if you’re going to use terms from the latter category. In other words, if you use a term “foo”, then ideally you would also let people know what “foo” means:

{
  "@context": {
    "foo": "https://vocab.example/foo"
  }
  "name": "Something",
  "summary": "A thing with a property foo SHOULD define what foo means.",
  "foo": true
}

If you don’t provide a JSON-LD term definition, then by default “foo” doesn’t mean anything. With JSON-LD, you can expand it to a URI, then use something like UDDP to get a canonical/authoritative URI definition. What alternative do you have? Registering the name of a profile with the IANA, where processors then have to be aware of new profile registrations and detect which profile is currently in effect that happens to define “foo”? Blindly assuming what “foo” means based on what the de facto usage of “foo” happens to be among your most common class of peers? The authoritative definition has to come from some authority that people agree to use. It’s relatively much easier to say “whoever currently controls joinmastodon.org gets to define what https://joinmastodon.org/ns/featured means” than it is to say “featured always means what Mastodon says it means” or “featured means what the profile says it means, but I won’t provide any further information on what the profile entails” or “featured should be added to the normative activitystreams context so everyone is required to use the same definition / forced to agree to that singular definition”.

This discussion is fragmented over a year of two of IRC discussions and meetings. I may have overstated the position a bit. James Snell was pushing for JSON-LD as a MUST and Tantek was pushing for it to be purely optional (MAY?) at most. And there were many positions in the middle.

In Dec 2015, the WG group accepted a Tantek proposal:

RESOLVED: All JSON-LD related details should go into a separate section (e.g. Appendix: Considerations for JSON-LD similar to the section in Annotations WG spec, but open to Editor alternatives) and also allowed in the Extensions section. Both typical publishers and developers should not have to worry about JSONLD.

This happened to some extent, but it wasn’t an appendix, and examples with @context still exist throughout the documents.

Earlier, in Aug 2014, Tantek claimed:

16:26 [tantek] also - “alignment with things like JSON-LD” has always appeared purely political, rather than technical for any actual real-world use-case.

What does the “Old Stuff” section title mean on Helge’s page? Does that mean it’s no longer being maintained?

I think JSON Schemas are the best current option. It existed during the original spec development period, but I don’t know how standardized it was at that point. It was definitely discussed, but I believe the W3C had a bias towards promoting their own standards (JSON-LD, RDF, …).

My experimental FIRM server implementation supports configurable JSON schema validation. Longer term, I’d like to explore LinkML as an alternative that would allow generating JSON Schema or a JSON-LD context (among other formats, like RDF schema languages).

2 Likes

For any specific term or for all term definitions? I’m not sure it matters either way. Communities can develop community-specific extension profiles without requiring W3C or Mastodon authority.

I agree it’s good to let people what “foo” means, but that’s not “in other words”. The two statements are practically unrelated. JSON-LD per se does not define meaning or domain semantics.

As you are pointing out here, URI expansion alone doesn’t provide much, if any, meaning or definition for a term. And UDDP is not specific to JSON-LD. With a schema language like JSON Schema, you can define the term in the schema itself or provide a URI that points to documentation.

Lot’s of options. FEPs are one. Putting documentation and schema definitions in a collaborative git repo is another option (NodeInfo, for example).

I don’t know. Helge was working on a big site update (and we talked about pulling schemes from other sources https://codeberg.org/funfedidev/schemas/issues/18), but that work has not been merged yet.

Communities can’t update the normative activitystreams context (like the SWICG “extension policy” proposes) or pressure everyone into adopting a de facto one (like Mastodon does simply by existing). It would be a shame for there to be only one such “extension profile” that everyone expects, instead of being able to pull terms from any authority and having an inherently associated definition. The profile can bundle multiple terms from multiple authorities, but it shouldn’t override the per-term authority. If you have terms defined only by profiles, then that means you require profiles. The much lower barrier to entry is to be able to use terms defined by UDDP without also defining a profile.

JSON-LD is the bridge between more idiomatic JSON keys/ids and the fully expanded URI whose definition could be obtained by UDDP. You can either use pre-expanded JSON-LD or you can use additional context, but the naive JSON processors have a much more realistic chance at handling the former rather than the latter. If the goal is that “typical publishers and developers should not have to worry about JSONLD”, then we can say that at least that much can be accomplished for in-AS2 terms, and we can reduce the burden for out-of-AS2 terms by either making consumer-side expansion unnecessary[1], or by having consumers only recognize “well-known” top-level remote contexts which are preloaded and can be used to infer term meanings like profiles (although there might be confusion if there are multiple such contexts which define the same term).

Schemas and contexts are not the same thing. You can attempt to indirectly refer to some context via the schema, but that’s not something that a lot of people are going to understand. You can also bundle both a context and a schema via a profile, if the profile comes with specific schematic constraints, but in that case the schema and context are related to the profile and not necessarily directly to each other.


  1. There’s a similar argument around whether to explicitly state redundant information, or whether the consumer can be expected to infer that information for themselves. For example, if we use “actor” which is a subproperty of “attributedTo”, then do we also need to explicitly use “attributedTo”, or do we require consumers to infer the “attributedTo” from the “actor”? If a naive consumer is checking for attribution, it needs to know to check for both properties. A publisher could decide to distill a whole bunch of inferred statements ahead-of-time for the consumer, but being that verbose has its own downsides. ↩︎

I didn’t say they could or should even want to do that.

I didn’t suggest “one such ‘extension profile’”.

That goal failed in any practical sense since it was only possibly true in the case where there are no extensions or external (compacted) vocabulary usage. I challenge you to find a popular federated ActivityPub implementation that does not use extensions (where, for the the purposes of discussion, “popular” means the server hosts > 0.0001% of the Fedi MAU).

A few quotes from AS2-related discussions (from future AP authors) that discuss the tight coupling between JSON-LD processing and AP extensibility…

18:43 [rhiaro] … People looking at things from a pure JSON point of view can’t interoperate, because every consumer has to understand JSON-LD

(2014-10-28 IRC Social).

17:42 [cwebber2] yes, a context defined by this group simplifies things enough so you don’t have to use json-ld stuff unless you want to do extensions

17:44 [rhiaro] … But when you are parsing, particularly for extensions, it reqiures an extended context to define the additional pieces. So at a minimum, to reliably consume an AS2.0 document, you need to have the json-ld expansion algorithm

(2015-03-017 IRC Social).

Given the Fediverse heavily relies on extensions and typically does not process JSON-LD, then one might wonder how they do it. They do it through ad hoc community discussions, open source implementations, documentation, etc. This has worked to some extent, but there’s significant room for improvement.

The pseudo-JSON-LD usage that exists is actually quite bizarre. Mastodon (and probably most other servers) requires the AP @context declaration even when it’s not required by the spec (given an AP/AS2 content type). Mastodon requires it although doesn’t require their own extension context. Of course, because of this Mastodon can’t properly handle another context definitions that have overlapping terms.

I didn’t say they were (I assume you mean JSON-LD contexts, but I didn’t make the claim in any case).

Schemas can be far more useful. Most of the commonly claimed JSON-LD (context) benefits are imaginary or only exist if one uses RDF with an ontology and/or schema (“defines semantics”, “is a schema language”, “represents enough information for automated inferencing”, etc.).

I think we’ve learned over the last 10+ years that ActivityPub JSON-LD has not been useful for developers in any practical sense (plain JSON processing remains dominate). Even for RDF-oriented developers that can benefit from AP JSON-LD, they’ve mostly moved on to real Linked Data projects like Solid. (That said, apparently AS2 is being used in a few non-AP Linked Data projects).

James Snell wanted JSON-LD processing to be a MUST in AS2. However, I’m guessing he wasn’t envisioning AS2-based communication as the scale it is today. JSON-LD processing is not cheap!

JSON-LD is really bad for APIs that need sub-millisecond response times at scale. Please stop your enterprise architects from making this mistake just so they gain “cool points” at the enterprise architect retreats. - Unconstrained JSON-LD Performance Is Bad for API Specs | Dr. Chuck's Blog

This was from someone who is otherwise a fan of JSON-LD.


To get back on topic, my feedback is that discussing how to restrict usage of @context is misguided IMO. I think we should be discussing alternatives to JSON-LD given what we’ve learned from 10+ years of community implementation experience. I’m not against JSON-LD, in general. It’s useful for it’s stated purpose as an RDF serialization. However, I don’t believe it was a good fit for ActivityPub.

I also don’t believe there’s any real possibility the W3C will address the issues. However, if ActivityPub developers choose to drop JSON-LD in favor of a more effective extension process, the W3C “Recommendation” becomes less relevant.

Are we really still talking about JSON vs JSON-LD? I thought this was discussed to death already.

Regardless of what side of the debate you are on, you must admit that the controversy around “to JSON-LD or not to JSON-LD” signals a failure of the standards process. Standards should not be controversial. They should seek common ground and thereby eliminate controversy and instead foster cooperation. The inclusion of JSON-LD in the ActivityPub standard is clearly controversial and thus, if you ask me, not standard-worthy material.

I am biased because I am on the “no to JSON-LD” side of the fence, but I think in this situation:

  1. A spec that arguably doesn’t take JSON-LD seriously.
  2. Practically no implementations exists that care about JSON-LD.
  3. Plain JSON is the worldwide de-facto standard for web APIs (not JSON-LD, which I had never even heard about before reading about ActivityPub, and I’ve been a software engineer for many years).
  4. JSON-LD is even controversial among the “theorists” of the spec (see this thread or the linked one earlier, or any of the other threads discussing JSON-LD).

We cannot continue pressing on and saying “But ActivityPub really should be using JSON-LD!”

The support for JSON-LD is just not there, not among implementors and not even (enough) among theorists. A standard cannot go on with a controversial thing like this. Again, standards should not be controversial.

Clearly a different path forward is needed, but I have no answers here as I don’t see a way to change the status quo of “ActivityPub uses JSON-LD but actually it’s optional and nobody uses it anyway so it’s just plain JSON in reality” without doing a breaking change in the ActivityPub standard (as discussed in the above linked thread), which would clearly be a no-go for existing implementations. Unfortunately this is a bit of a weakness of the decentralized approach, as nobody has authority (by design, of course) to put their foot down to set a common path forward.

So I have no answers, but this just makes the JSON vs JSON-LD discussion even more useless. I’m not sure what I’m hoping to achieve with this reply aside from trying to say: Perhaps we should spend our time on discussions more fruitfully rather than going down the same useless path again and again?

3 Likes

I don’t think this thread is about “JSON vs JSON-LD” – notice that the title is “JSON formatting of JSON-LD messages”, not “Should JSON-LD be used”. The question it poses is around best practices on how to write JSON-LD contexts that result in a serialization easier to work with for naive processors parsing JSON-LD using a JSON processor instead of a JSON-LD processor.

Fundamentally, these naive processors need to know how to deal with meaning when AS2 doesn’t define a term. The JSON-LD processor can just use JSON-LD to work with those terms, but the naive JSON processor has no idea what to do because all terms in JSON are by default meaningless. My response to @aschrijver includes an overview of those best practices – either rely ONLY on the normative AS2 context and leave external terms fully expanded, or otherwise try to establish well-known remote contexts that can be preloaded by JSON-LD processors or used like profiles by naive processors. The response by @stevebate instead proposes or advocates for some alternative to be developed which serves the same role as JSON-LD but isn’t JSON-LD. It’s unclear what that alternative would look like. @stevebate prefers some kind of community-based process, possibly involving extracting documentation out of links included in a JSON Schema file? But that’s a separate concern – the same community process could also produce JSON-LD context documents in addition to schemas or shape constraints.

Otherwise, @stevebate points out how many problems with the serialization aren’t actually due to JSON-LD:

Which I agree are problems that are useful to solve, but I disagree with the conclusion:

…simply because the JSON-LD context is exactly what controls the compaction to a document that is probably more idiomatic for JSON parsers. So it’s part of the solution space, because if you can write a JSON-LD context for an extension term, that means its semantics are sufficiently constrained instead of the incredibly vague mostly undefined soup that current fediverse implementations put out there and expect everyone else to agree exactly with.

In other words, if the problem is “simple AS2 object serializations can have 100s of possible valid representations”, then the solution is “they should have exactly 1 canonical representation”. AS2-Core requires that you MUST be consistent with what JSON-LD compaction would produce using the normative AS2 context, so any term defined by AS2 has a correct serialization. The problem is that terms not defined by AS2 depend on whether you provide additional context or not. AS2-Core says that you MAY do this, but arguably the best practice is that you SHOULD NOT do this unless there is a way for peers to understand and agree upon additional context. That’s what’s missing right now in the fediverse – a way to negotiate additional context. Otherwise, you can only guarantee and depend upon solely what’s in AS2 and nothing outside of it. (The secondary aspect of this is that fediverse softwares actually place additional constraints beyond what AS2 provides, so they expect guarantees that are not there.)

JSON-LD provides URI expansion and provides some simple data typing of properties (if someone is using JSON-LD processing, which almost no one in the Fediverse does). I know I’m repeating myself, but… JSON-LD does not provide meaning or semantics. That seems to be a foundational argument of your recommendations. In other words, just adding a @context to a “mostly undefined soup” of JSON will not provide meaning or semantics. It will allow you to serialize the soup to RDF, if someone wants to do that, but that’s not a significant benefit for most Fedi developers.

Ironically, “meaning” in Linked Data (RDF) is often defined using ontologies and/or schema languages (SHACL, ShEx) which you appear to be strongly resisting. JSON Schema seems to be the most popular option for JSON processing, but there are other possibilities.

Using the context hacks being discussed in this thread also won’t provide automatic term disambiguation without JSON-LD processing. For example, even if you constrain the context to a simple list of top-level URIs of external contexts, the order of the URIs will matter. You can see a “compacted” extension term called “frob” and you won’t know the expanded URI without using a JSON-LD expansion algorithm or studying each of the external contexts and determining if there are any clashes in the vocabulary and knowing how to manually resolve them. That negates one of the other claimed benefits of using JSON-LD without JSON-LD processing.

That’s why I don’t believe JSON-LD context constraints/hacks are the best strategy for supporting Fediverse/AP software profiles/extensions (or for any other purpose). If JSON-LD processing ever plays a signifiant practical role in Fediverse development, I’m willing to reevaluate my position.

2 Likes

To reiterate: this thread is about doing the opposite. We are not discussing expanding JSON to JSON-LD, we are discussing which serialization best practices for JSON-LD will be least painful to parse as JSON. As @aschrijver asks in the first post:

And as I clarify in the 3rd post, “JSON convenience” (idiomaticness) is achieved by using additional context, while not using additional context at least guarantees exactly 1 correct unambiguous representation if you’re willing to sacrifice any/all idiomaticness. So the practical choice for serializing JSON-LD into AS2 ends up being either “use zero additional context” or “use well-known remote contexts which can be treated equivalently to profiles”. But if you used profiles instead, you still depend on your peers to be able to recognize and negotiate something like Content-Type: application/ld+json; profile="https://www.w3.org/ns/activitystreams https://mastodon-profile.example/" or Content-Type: application/activity+json; profile="https://mastodon-profile.example/" in the same way that they would have to recognize and negotiate "@context": "https://mastodon-context.example/" if they weren’t simply ignoring "@context" altogether and looking only for the fully expanded serialization. If you include something in a profile but leave it out of the "@context", you are violating the requirements of application/ld+json: https://www.iana.org/assignments/media-types/application/ld+json

Optional parameters: profile

   A non-empty list of space-separated URIs identifying specific 
   constraints or conventions that apply to a JSON-LD document 
   according to [RFC6906]. A profile does not change the semantics of
   the resource representation when processed without profile 
   knowledge, so that clients both with and without knowledge of a 
   profiled resource can safely use the same representation. The 
   profile parameter MAY be used by clients to express their 
   preferences in the content negotiation process. If the profile 
   parameter is given, a server SHOULD return a document that honors 
   the profiles in the list which it recognizes, and MUST ignore the 
   profiles in the list which it does not recognize. It is 
   RECOMMENDED that profile URIs are dereferenceable and provide 
   useful documentation at that URI. For more information and 
   background please refer to [RFC6906].

If you applied usage of the profile= parameter to application/activity+json you could loosen the requirement for profiled resources to have an equivalent context included in their resource representation, but then every JSON-LD processor needs to recognize the profile and inject an equivalent context, just as they are required to do for the AS2 context in any resource represented as application/activity+json. But if those profiles don’t provide that equivalent context ahead of time, then anyone using JSON-LD ends up having to write an unofficial one for each profile.

If you consider what the fediverse is currently doing here, they are largely hand-rolling their own custom processors which hardcode all the semantics into bespoke serializers hooked up to equally bespoke database formats. Making slight modifications could require forking the whole project or even writing your own from scratch, and those modifications wouldn’t be portable to any other project because they don’t share any sort of common base for processing data. On the other hand, if 2 projects both use JSON-LD processing, then the serialization could be shared as a modified context document which could be loaded into both projects transparently, so that they both produce the same document from the same data using the same context. No need to write your own serializer from scratch, and suffer 100 different ways to serialize the same data because there’s no consistency. But again, most fediverse softwares write off the entire problem by simply assuming (incorrectly) that everyone else shares their exact semantics, and treats the whole thing as a translation exercise where they take their database and map their fields to (their interpretation of) AS2. Then, because JSON-LD is ignored, it results in a situation where your peers have to accommodate your exact understanding of “frob” if they want to be understood. You never actually disambiguate https://some.example/frob from https://another.example/frob, because you never considered the possibility that “frob” could mean different things to different peers. So now you have to get everyone to agree that “frob” means exactly what you think it means, which means you either de jure legislate that “frob” means this for everyone, or you de facto collapse into a state where “frob” means whatever the largest software expects it to mean. For decentralized extensibility to work, you need to qualify terms against some authority, and then that authority gets to authoritatively define what the term means. No one gets to define what “frob” means to everyone, but peer1.example gets to define what https://peer1.example/frob means because we collectively agreed that’s how https: should work, and the registration for https: is managed by the IANA which we all collectively agreed to use as a root-of-authority by consensus.

I’m not resisting SHACL if someone wants to apply schematic constraints to validate a document against a certain shape, but you still need the semantic concerns to be addressed beforehand. If you see Content-Type: application/json, then you can’t know what any term means. JSON Schema doesn’t provide semantics for JSON documents by default – it just tells you that a key of “frob” should have a value that is a JSON string, and so on, without ever telling you what “frob” means. The meaning of “frob” has to come from something like JSON-LD "@context" or Content-Type: application/frob or some profile provided as either a media type parameter or as a link with rel=profile. Multiple mechanisms may be used, but they should agree with each other. Just like how multiple schemas can be applied – if someone wants to validate a document against both SHACL and JSON Schema, they can do that. But I’m guessing that the JSON Schema expects a compacted document and no one is writing a JSON Schema assuming an expanded document. Expanded JSON-LD is unambiguous, but also unidiomatic. Compacted JSON-LD can be more idiomatic, but it is also ambiguous modulo context.

This would apply just as much with profiles and media types as it would with contexts, so I don’t see your point.

Also, the goal in this thread is to minimize JSON-LD processing needed to understand JSON-LD documents, as I have reiterated. If you’re not using JSON-LD at all, then you still need to arrive at the same semantics through some other mechanism. Expanding JSON-LD documents using a certain context is only one possible mechanism for arriving at semantics; it just happens to be the one recommended by AS2-Core. Likewise, AS2-Core recommends that you do the bare minimum to expand terms to identifiers, with the understanding that any term that doesn’t expand may as well be undefined (@vocab: _: nominally does this for generalized datasets).