FEP-cb76: Content Addressed Vocabulary

Hello!

Please use this thread to discuss the proposed FEP and any potential problems or improvements that can be addressed.

Summary

JSON-LD context definitions typically live at some URI which gets used as a namespace. It is generally expected that the URI is long-lived, and often the context document is retrievable from that URI, but sometimes these links break due to technical errors, expired domains, and other such issues. This FEP proposes adopting a solution proposed by [CAV] for any extension terms defined within other FEPs, as well as optionally for standard vocabulary.

2 Likes

I’m very much for the possibility of adding vocabulary extensions and even more so for making vocabularies (and everything) content-addressed.

However, I feel that the way this allows extensions is very limited. This FEP (and the original proposal by @cwebber) allow extension terms - single terms that are content-addressed. A vocabulary, on the other hand, is a collection of terms. I’m convinced that any slightly more complex extension will require multiple terms that potentially refer to each other - a collection of terms, a vocabulary.

I think it would be better if an entire collection of terms with their human readable (and possibly multi-lingual) descriptions can be content-addressed.

I’ve brought up the same objection to the original proposal (Content addressed vocabulary for extensions).

1 Like

This specification is then used to calculate a SHA256 hash, which can be used as a URN within @context in lieu of a namespaced property.

@trwnh Is sha256 URN namespace defined somewhere? I found this page – Uniform Resource Names (URN) Namespaces – but sha256 is not listed there.

There’s a self-describing format called multihash (spec) which is used by IPFS. However, generating multihash is not as simple as echo -n "..." | sha256sum so it’s probably not a good choice.

Oh, good point. I’m not entirely sure that @context MUST map terms to IRIs, but it does seem like a generally good idea. So I see a few options for proceeding:

  • Drop the URN prefix and just use the sha256 hash directly
    • This makes the namespace empty after normalization (represented by an underscore _: prefix). Maybe this is okay?
  • Use urn:publicid as defined in RFC3151
    • This pretty much allows any string that would be a valid “public identifier” in XML. We could use urn:publicid:
  • Use DIDs
    • Which DID method would be usable for this?
    • Also, this massively complicates the FEP, as DID methods are “too powerful” for what we need; we don’t need CRUD operations.

A vocabulary is just a collection of terms, no? A context document is basically just that, as well. You can self-host your own context document, or use one hosted on the Codeberg fep repo, or whatever you want. What matters ultimately is that the nodes have consistent naming after JSON-LD normalization.

I’m not aware of any DID methods that could be used to represent content hashes. There’s always some kind of account or key.

I actually like urn:sha256 namespace, and now I remember where I have seen it before: magnet links. It seems that nobody succeeded in standardizing hash URNs, though there’s at least one IETF draft: draft-thiemann-hash-urn-01 (now archived)

My thinking is that, in the absence of a DID method for managing FEPs and their vocabulary, I’d go with option 1 and just use the hash directly without a namespace – but I don’t know if there are any downsides or disadvantages to this, so I’m hesitant to adopt it definitively.

An alternative method to explore – using urn:publicid with a public identifier based on the FEP that defines the extension term:

{
  "@context": {
    "fep": "urn:publicid:fep:",
    "canReply": "fep:5624:canReply"
  }
}

The above would basically be equivalent to saying:

  • The term fep maps to the prefix urn:publicid:fep:
  • The term canReply maps to the compact IRI fep:5624:canReply which expands to the full IRI urn:publicid:fep:5624:canReply

In essence, we would be using the public identifier fep:5624:canReply and turning it into a URN via the URN scheme urn:publicid.

  • Pro: no need to hash a definition text. We instead refer directly to the FEP that defines all semantics for the term definition.
  • Con: if an FEP is superceded, would any terms defined by that FEP become deprecated? (I guess this depends on how the FEP process handles follow-up FEPs.)

Maybe that con isn’t really a con? Maybe it would actually be beneficial to have a way to refer to different versions of a term that require different implementations.

In some ways, this mirrors the XEP ecosystem for namespaces and node identifiers. For example, OMEMO as defined in XEP-0384 currently uses the namespace urn:xmpp:omemo:2 and the node identifer urn:xmpp:omemo:2:devices.

But I guess that raises another con…

  • Con: What if the FEP process ends up in the future requesting an official URN namespace from the IETF? That would cause all existing URNs in existing documents to be orphaned. And there’s no way to define urn:fep as equivalent to urn:publicid:fep, because JSON-LD forbids mapping an IRI to anything other than a definition. (See Examples 42-44 and the associated warnings: JSON-LD 1.1)

Perhaps that is “too formal”, but it is something that might be proposed or done at some point. So adopting urn:publicid in essence implies a recommendation against getting an “official” urn:fep namespace in the future.