Content addressed vocabulary for extensions

I wrote in detail about “content addressed vocabulary” for vocabulary extensions (ideally for any terms, but we already have ActivityStreams, so this is probably a good fit for extensions going forward, and we can preserve existing URIs for old terms).

The general idea is, take the text for a term, eg the sensitive property, and hash it:

$ echo "The sensitive property (a boolean) on an object indicates that some users may wish to apply discretion about viewing its content, whether due to nudity, violence, or any other likely aspects that viewers may be sensitive to. This is comparable to what is popularly called \"NSFW\" (Not Safe For Work) or \"trigger warning\" in some systems. Implementations may choose to hide content flagged with this property by default, exposed at user discretion." | sha256sum
81d98cf83fcf733400ad5d2a25495feeea47f287193a53a9722f4cb025da88f1

Now you have your term: urn:sha256:81d98cf83fcf733400ad5d2a25495feeea47f287193a53a9722f4cb025da88f1

Map that in your json-ld context (or just use it as the property name if you’re advocating the litepub route I suppose) and you’re done. Can avoid the governance quagmire the SocialCG hit, requires no centralization of terms, and there’s no “namespace” to worry about going down.

Thoughts welcome (though reading the post is encouraged).

1 Like

I like the idea of using content-addressing. I think it is the way to go.

But why not hash the entire JSON-LD context?

curl -LH "Accept: application/ld+json" https://www.w3.org/ns/activitystreams | sha256sum 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7856  100  7856    0     0  12123      0 --:--:-- --:--:-- --:--:-- 12123
335026432904cf7a433516793b5ff44719c5ae752805fae721a3f0f29310bdfb  -

And then use the uri: urn:sha256:335026432904cf7a433516793b5ff44719c5ae752805fae721a3f0f29310bdfb in the @context field.

Reasons I prefer this to just hashing the term description includes:

  • In which language should the term description be given? I don’t think agreeing on English is fair.

  • I think there is value in grouping related terms. For example the description of ActivityStreams attributedTo is:

    Identifies one or more entities to which this object is attributed. The attributed entities might not be Actors. For instance, an object might be attributed to the completion of another activity.

    It references the object term. So maybe some kind of context is required in explaining the meaning of individual terms. I wonder if we can always describe terms in isolation?

There are some more problems I see (also with the solution I propose) but I don’t see how they can be solved with JSON-LD in any way.

[2020-02-28 09:07:55+0000] pukkamustard via SocialHub:

I like the idea of using content-addressing. I think it is the way to go.

But why not hash the entire JSON-LD context?

curl -LH "Accept: application/ld+json" https://www.w3.org/ns/activitystreams | sha256sum 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7856  100  7856    0     0  12123      0 --:--:-- --:--:-- --:--:-- 12123
335026432904cf7a433516793b5ff44719c5ae752805fae721a3f0f29310bdfb  -

And then use the uri: urn:sha256:335026432904cf7a433516793b5ff44719c5ae752805fae721a3f0f29310bdfb in the @context field.

One reason I could see against hashing the context is that it forbids making non-breaking updates on it, I think using something like datashards for context(s) would be better.

Reasons I prefer this to just hashing the term description includes:

  • In which language should the term description be given? I don’t think agreeing on English is fair.

Same, but I’ve yet to see standards written in other languages. And having multiple languages could mean different meanings.