Practices around JSON formatting of JSON-LD messages

From whence cometh meaning?

  • JSON alone does not carry any semantics for any terms used in a document.
    • Semantics with JSON-LD are usually provided by term definitions provided via some JSON-LD context.
      • Terms can be anything, but a best practice / recommendation is to have terms which dereference to the term definition.
        • http(s): identifiers are often used as terms because the HTTP(S) protocol provides a convenient default agreed-upon way to obtain term definitions, if you follow the best practice of hosting term definitions at those canonical origins.
          • The URI Definition Discovery Protocol, or UDDP, codifies how to obtain term definitions via HTTP(S) for terms that use http(s): identifiers, relying on HTTP return codes, HTTP resource body content, and HTTP Link headers.
          • However, note that while http(s): terms can have a “canonical” term definition if following the UDDP, it is still possible to load your own term definitions out-of-band.
        • Terms that aren’t a “URI with a canonical dereferencing algorithm” must load term definitions out of band, because UDDP requires a URI to be canonically dereferenced to its URI definition.
          • Because these terms don’t have a canonical definition, peers must agree to load the same term definitions into their processors. So terms need a consensus algorithm to obtain their definitions (that isn’t UDDP).
    • Ultimately, semantics in general are provided by mutual agreement.
      • Mutual agreement in Linked Data relies on UDDP to obtain canonical term definitions for a URI, nominally via HTTP(S) protocol.
      • Mutual agreement outside of Linked Data is usually signaled with IANA media types.
        • application/activity+json is intended to carry the same semantics as "@context": "https://www.w3.org/ns/activitystreams".
          • A processor encountering an AS2 Content-Type can inject the normative AS2 context always, as the last declared context (so that it is not overridden).
            • Since the normative AS2 context is required and is functionally always in effect, JSON-LD processors should arrive at the same semantics as those defined by the AS2 specs.
          • Processors not using JSON-LD should arrive at the same semantics as those defined by the AS2 specs, but they do so by manually reading the spec and manually hardcoding the semantics into their processors.
            • This semantic encoding process results in terms defined by AS2 having known meanings, extracted by whoever read the spec and hardcoded those meanings into the processor.
              • Note that the spec may not be interpreted perfectly or in the same way between peers, which causes problems.
            • This semantic encoding process also results in terms NOT defined by AS2 having no known meaning by default.
      • Thus, peers and processors need a way to arrive at mutually-agreed-upon term definitions for any terms not defined by AS2.
        • AS2 recommends that you SHOULD use JSON-LD to define these “extension” terms.
          • Processors using Linked Data should arrive at a canonical agreed-upon meaning if the terms are defined correctly using UDDP.
          • AS2 allows that you MAY augment the JSON-LD context, but doing this creates an issue for processors that don’t use JSON-LD.
            • By default, naive processors should ignore the "@context" entirely, so terms will not be expanded correctly to their full URIs, and the meaning will be ambiguous.
        • In the absence of JSON-LD definitions for these non-AS2 terms, current fedi implementations just YOLO it and blindly assume that all other peers always agree with the semantics that they hardcoded into their processors.
          • Thus, the de facto consensus algorithm is “just do whatever Mastodon does” or “just do whatever Lemmy does”.
          • Worse, there is no acknowledgement that peers might actually disagree with you.
            • Semantic confusion is therefore basically blindly accepted – whatever breaks is not observable.
            • Semantic attacks are possible by using an expected shorthand which actually expands to something different than what is expected.
        • An alternative to “de facto consensus” is to maintain a central registry of allowed terms and their definitions.
          • This is an idea being attempted by the “AS2 extensions policy”, which in effect removes decentralization from AS2 and forces retroactive updates to any AS2 documents using the normative AS2 context.
        • Another alternative to “de factor consensus” that preserves decentralization is to define profiles which import additional terms and their definitions.
          • For example, a “Mastodon profile” could include all the additional semantics and constraints required by Mastodon processors.
            • Such a profile can also provide its own JSON-LD context for convenience to JSON-LD processors.
            • Terms used by this profile ideally should follow best practices for obtaining canonical term definitions, although a sufficiently constrained profile can be used to derive these term definitions in the same way you’d derive term definitions from AS2 without the JSON-LD context (by reading the spec/profile).

Practical approaches to extensibility

Now, with regards to what you can practically do with AS2 documents and JSON-LD extensions, this was previously discussed in FEP-e229: Best practices for extensibility as well, and I intend to incorporate all this into the next revision of https://w3id.org/fep/e229.

Option 1: Do not include any additional context.

  • JSON processors that ignore "@context" look for canonical, fully expanded identifiers.
  • Object properties cannot be expressed as a JSON string; they must be expressed as JSON objects using "id".
{
  "http://joinmastodon.org/ns#discoverable": true
  "http://joinmastodon.org/ns#featured": {"id": "https://mastodon.example/users/alice/featured"}
}

I would recommend this as the most straightforward way to allow processors to ignore "@context" entirely.

Option 2: Include prefixes only.

  • JSON processors can no longer ignore "@context".
  • Namespaces alone aren’t enough because they need some kind of authority to define terms in that namespace. toot: means nothing on its own and carries no authority, until expanded.
{
  "@context": {
    "toot": "http://joinmastodon.org/ns#"
  },
  "toot:discoverable": true
  "toot:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "mastodon": "http://joinmastodon.org/ns#"
  },
  "mastodon:discoverable": true
  "mastodon:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "foo": "http://joinmastodon.org/ns#"
  },
  "foo:discoverable": true
  "foo:featured": {"id": "https://mastodon.example/users/alice/featured"}
}
{
  "@context": {
    "toot": {
      "@id": "http://joinmastodon.org/ns#",
      "@prefix": true
    }
  },
  "toot:discoverable": true
  "toot:featured": {"id": "https://mastodon.example/users/alice/featured"}
}

Expanding using prefixes only can be less complex than expanding using complex term definitions, but it doesn’t get you much in return for requiring expansion except maybe making that expansion a bit less complex… but not by much. So I don’t think this is worth it, really.

Option 3: Include embedded context with term definitions.

  • JSON processors cannot safely ignore "@context".
  • No need to fetch remote context documents (although see next section on how this can be avoided anyway).
  • String values may or may not expand to ID references, depending on whether a term is defined as @type: @id or not.
{
  "@context": {
    "discoverable": "http://joinmastodon.org/ns#discoverable",
    "featured": {
      "@id": "http://joinmastodon.org/ns#featured",
      "@type": "@id"
    }
  },
  "discoverable": true
  "featured": "https://mastodon.example/users/alice/featured"
}

I think this actually can make things more complex for anyone not using a JSON-LD processor, and it’s kind of the de facto state of fedi right now, with Mastodon cramming a bunch of term definitions in its embedded context, except those term definitions are actually incorrect in some cases. It might actually make more sense to detect software name and version via something like NodeInfo, then inject a corrected context, which is a wild thing to even suggest. All the worst parts of user-agent sniffing.

Option 4: Include a remote context.

  • JSON processors can’t ignore "@context" entirely, but they can avoid JSON-LD processing if they know ahead-of-time what a context identifier means, just like how they might know what "https://www.w3.org/ns/activitystreams" means ahead-of-time.
    • Best practice is to make context identifiers immutable, so that they don’t have to be dereferenced as remote context documents.
    • The JSON-LD context document can be obtained ahead-of-time and preloaded into a JSON-LD processor. Modern JSON-LD based specs actually require this now, with SHA256 hashes of the context documents provided so you know you got the correct document.
    • The context identifier can also content-negotiate to HTML documentation of the terms, and refer back to any profiles that may be in effect.
    • The goal should be to get the context document to agree with the spec/profile completely.
{
  "@context": "https://joinmastodon.org/contexts/v4.5.0",
  "discoverable": true,
  "featured": "https://mastodon.example/users/alice/featured"
}

Note that this is what more reasonable contexts do, for example security/v1 vs security/v2. This is really the most “idiomatic JSON” approach to JSON-LD contexts because JSON-LD can be truly optional, assuming the terms are defined correctly. The referenced JSON-LD @context values can even be seen as a sort of profiling of the document’s semantics, similar to what might be done with Content-Type profile= parameters or a rel=profile Link, but in the body content instead of in the HTTP headers. Those other profiling mechanisms may still be used, but you can’t expect any peer to use a specific mechanism right now. For JSON-LD processors, if they are aware of the profile out-of-band (via HTTP headers), they can inject the appropriate context. For JSON processors, they could use any of the 3, really.

1 Like