Desired changes for a future revision of ActivityPub and ActivityStreams

No, it’s not an RDF thing, it’s explicitly stated that all objects (including activities!) “MUST” have a globally unique identifier or else they are intentionally transient and not intended to be looked up. This means that the identifier “must” be a publicly dereferenceable URI (and “SHOULD” be of the https:// scheme), or otherwise it is anonymous and (“must”) be null. Or as it later says when defining id, it “MAY” be omitted (instead of being explicitly null).

This could probably be stated more clearly using MUST instead of must for the part about falling into one of two categories, but the requirement is clearly there.

Linked Data / RDF processing is irrelevant here except in noting that it would elide out the "id": null statement.

Now, there is certainly a separate concern in that assigning an HTTPS URI does not necessarily mean that the URI resolves to a resource when you dereference it… this does seem like an omission in the spec language. I am guessing that the intent was to have all objects be “resolvable” and not simply “dereferenceable”. Otherwise, the id serves little purpose except as an idempotency key for processing activities in the queue, or for linking graph nodes together when a property is defined as @type: @id. It doesn’t really make sense to say a URI is “publicly dereferenceable”, because dereferenceability is a property of the URI scheme and not of the object. In the case of HTTPS, if you have an https: URI, then you can “dereference” it simply by having it or knowing it, and then performing an HTTP GET against it. But you might not be able to “resolve” it successfully to any resource. This is discussed at length in Dereferencing HTTP URIs in which “The act of retrieving a representation of a resource identified by a URI is known as dereferencing that URI”, although the response you receive might not be a representation of that resource. This is especially relevant to non-information resources, for which dereferencing those URIs might result in something like an HTTP 303 “See other” code (although nowadays it’s common to respond with HTTP 200 “OK” and return a resource descriptor, or otherwise identify the non-information resource with a fragment identifier within the returned information resource).

1 Like

I was talking about identifying activities associated with objects, given only the object URI.

Oh, sorry, I wasn’t clear on what “it” referred to.

If you meant the mechanism to go from an object to any activity related to it, I don’t think that’s currently fleshed out. You could implement a SPARQL endpoint or otherwise use something like fep/fep/bad1/fep-bad1.md at main - fediverse/fep - Codeberg.org which I probably need to revisit in concept.

But again, almost no implementations actually treat the objects as JSON-LD, so these benefits are purely theoretical and their benefit/necessity has yet to be truly seen.

Also, I would hardly say that you get this “for free” when using JSON-LD. In my experience, JSON-LD is rarely supported well by libraries while JSON is totally ubiquitously supported. As far as I understand, JSON-LD requires more processing and the switching between various representations seems to me to preclude any kind of zero-copy handling of incoming objects (I could be wrong here but I don’t see how you would do it). JSON-LD is just more complicated than JSON and complexity is the worst cost in software engineering, so it is decidedly not free.

In addition, I think that you could gain the same benefits with pure JSON. It seems to me that all that JSON-LD achieves is a namespace on field names to avoid field/type name collisions. As you say, several different vocabularies could use the type name “Travel” and it could be interpreted in each of their respective ways because the namespace (or context) is different. But, to me, a much simpler solution to that problem would be to just use more specific field/type names in plain JSON - that does not seem hard. E.g. don’t call it “Travel”, call it “activityVocabularyTravel” or whatever, to ensure that it is unambiguous (you could even take a step further and require that names are pre-defined UUIDs or something to ensure absolutely no chance of collisions, but I don’t think going that far is necessary). With that in mind, JSON-LD seems unnecessary.

2 Likes

They immediately become obvious as soon as you have a conflict in term names. How do you propose resolving naming conflicts without a central authoritative body?

If you wanted to avoid complexity but retain an unambiguous representation, then you would use JSON-LD expanded form instead of compacted form. Yes, you lose human readability, but you also completely eliminate the need for @context. It looks like this:

{
  "@id": "https://domain.example/some-actor",
  "@type": ["https://www.w3.org/ns/activitystreams#Person"],
  "https://www.w3.org/ns/activitystreams#name": [{"@value": "Alice P. Hacker"}],
  "http://www.w3.org/ns/ldp#inbox": [{"@id": "https://domain.example/some-inbox"}]
}

so it’s more verbose, but also in some ways simpler – you know exactly what is an ID and what is a Value, you avoid naming conflicts by namespacing your IRIs, and so on.

The primary reason compacted form exists is to “upgrade” plain JSON documents into JSON-LD. Say you have an API that already returns shorthand terms for JSON property names. Rather than converting your entire API to use IRIs everywhere (and breaking all existing consumers), you can just add one more key: @context. IMO, compacted form shouldn’t be used when designing a data interchange format from scratch. It only really makes sense when converting JSON to JSON-LD. The space saving you get from compacting your JSON-LD document against some context is real, but it’s not significant compared to just gzipping everything. And IMO, it introduces more problems when you hide the reality of the system from the users and developers, which I consider “necessary complexity”. It also introduces the possibility that a single attribute or property can suddenly be represented by infinitely many string keys, because that’s what “context” is and what it implies – reconciling different terms for the same thing.

The AS2-Core spec chose to hide this “complexity” by mandating that everyone just use the AS2 context and to not override it. They probably did this because they wanted the JSON of AS1-JSON to be cleanly migrated to AS2, and they thought developers would think that the compacted form is “prettier” or otherwise easier to parse visually and to understand as a human reader. It certainly makes examples easier to follow! But it makes actually working with the data harder than it needs to be once the natural complexity starts creeping back in. Suddenly you need to understand JSON-LD concepts like “context mapping” and “expansion” at minimum. If AS2 had gone with expanded form instead, you could still parse it as plain JSON, just that the parsing would be a bit more roundabout – to get the inbox of a person, you wouldn’t be able to do it via the inbox property’s value, you’d have to get the http://www.w3.org/ns/ldp#inbox property’s value which is an array, and then grab the first item out of the array, and then get the @id property’s value from there. So it’s less direct, but more importantly, it’s completely unambiguous what you ended up with. You don’t need to look in the @context to see that inbox is actually http://www.w3.org/ns/ldp#inbox, or that the @type of its value is @id. You already have that information ahead-of-time.

This is just JSON-LD with extra steps.

2 Likes

In practice, conflicts in term names don’t happen that much, at least as far as I’ve seen. Do we have any past examples of naming conflicts that led to serious problems (i.e. led to something bad or weren’t easily/quickly resolved)?

In that sense, the benefits are theoretical and I would not call them “obvious”. And again, conflicts in term names can be easily avoided without a central authority by simply making term names detailed enough to be unambiguous. You do not need JSON-LD for that. If you really want to ensure no chance of collisions, again, just use a UUID key. Once again, no JSON-LD required and you only lose a bit of human readability (which is, if you ask me, not that important anyway; mapping to a human readable format would be trivial).

It is definitely not just JSON-LD with extra steps - it does not require any special handling beyond standard JSON handling. You need no special transformation of the data and you need not read an additional specification. There are not multiple different representations of the data that you need to handle. You do not need a special JSON-LD-aware library - you can just use your standard JSON library without losing any compatibility with other services. You can add and support extensions by still using normal JSON with your usual well-supported JSON library.

This is vastly simpler from an implementation point-of-view. And again, complexity is the worst cost in software engineering, so conversely simplicity is the greatest gain.

There’s also a human development cost to JSON-LD, in the sense that it is confusing if you are not familiar already, and I would say most aren’t. I had personally never heard of JSON-LD prior to reading ActivityPub, but practically every single software engineer knows what JSON is. It’s a cognitive/educational load that could be avoided.

If anything, JSON-LD is the mechanism with “extra steps”, requiring all kinds of special handling that seem entirely unnecessary if you just used plain JSON in the way I suggest here.

1 Like

This depends on what you think JSON-LD is. If you think of “plain JSON” with the following restrictions:

  • the key of every property must be an IRI
  • the value of every property is a set of either @id or @value nodes

then congratulations, you’ve just reinvented JSON-LD expanded form. The extra complexity only comes in once you want to use something other than IRIs as keys. But why would you? IRIs are better than plain strings or UUIDs or whatever, because they have the chance of returning something when you deref them. You can fetch http://schema.org/actor and it links to documentation on exactly what the term means. You also can immediately tell it’s not the same as https://www.w3.org/ns/activitystreams#actor.

Well, let’s say you still want to use non-IRIs as keys, because you already have an existing API that returns JSON. This is where you would stick a @context in there to make your ambiguous JSON into unambiguous JSON-LD. But this is an optional step. You can continue to work with the unqualified JSON… but the second you want to share data with some other system, you NEED that other system to have the same context as you, the same fundamental understanding of what an actor is. On the scale of the entire Web, you really CANNOT depend on any two random systems to share the same context a priori. Is this document using actor in the AS2 sense or the Schema dot org sense? You have no way of knowing. Suddenly, you’ve fractured your network into two halves: one that understands AS2, and one that understands Schema dot org. This is bad for the Web. The unambiguous thing to do would be to take whatever JSON you have, inject a @context… and then expand it to something unambiguous. But again, we don’t do that, because… it looks prettier to not do that.

This doesn’t require anything other than standard JSON handling unless you want to convert between JSON-LD forms, most likely to get rid of @context and end up with something unambiguous, which is what you need. Or alternatively, you can ignore the JSON-LD bits, but you lose extensibility and you’re forced to stay within whatever worldview you’re assuming the document shares with you. This is what I mean when I say that JSON-LD is the cost of decentralized extensibility, or that you get decentralized extensibility “for free” with JSON-LD – when everything is an IRI, it’s unambiguous. You don’t need a JSON-LD library to work with expanded form. You also don’t need a JSON-LD library to work with compacted form as long as you implicitly share the context of the document. The complexity only starts to creep in once your implicit context differs from that of the document. That’s when you have to start asking questions like:

  • “wait, is the actor of this activity the actor or the activitystreamsActor or the doer_of_activities or…”
  • “i have an actor property but how can i be sure that it refers specifically to the entity that performed an activity? what if it’s an actor in a movie? what if it’s a person or organization responsible for a process in the Internet Of Construction? what if it’s a contextual aspect according to BS EN 17412-1 (2020)? if only there was some way of unambiguously knowing this…”

and these questions start cropping up more and more as you stray further and further outside the bounds of any one specification or problem domain.

In summary, if the goal is to avoid using a special JSON-LD processor and be able to parse a document with standard JSON processors, then that’s something that can be accomplished. But generally this means either “use expanded form” or “assume shared context ahead-of-time”. And AS2 went with the latter, but basically punted all other extensibility considerations to JSON-LD, which I admit can get annoying (and non-trivial!) when you have to process @context. So AS2 ended up in this situation where JSON-LD is the extensibility method, but you’re forced into using the compacted form, which is decidedly not ideal for extensions. I’m going to once again point to the discussion at FEP-e229: Best practices for extensibility for exploration of the implications of extensibility in a world where you’re forced to use compacted form.

My guess is that approximately nobody (other than you?) thinks of “plain JSON” AP as being that. :wink:

1 Like

I really don’t see this as a big benefit. If you receive an unknown term in a JSON object and you want to find out what it means, I’d say just search the web or an extension repository (like the FEP repository) or just ask whoever gave you the message for what it means. Whatever is returned by the IRI cannot be machine-read anyway, so I don’t see the benefit of having it in the data. Besides, who’s to say that IRI link will work forever?

But all that said, by all means just use IRI keys. That’s fine by me honestly. It certainly qualifies as “detailed enough to be unambiguous”. But IRI keys does not require JSON-LD. Why would you need all the rest of JSON-LD, like different representations of the same data and all the extra complexity that entails? What benefit do I gain from JSON-LD versus just plain JSON with IRI keys? Just the fact that the keys are shorter when I put the JSON-LD in a compressed form with the @context key? That’s not a benefit, that just highlights the problem of how the different forms of JSON-LD make interpretation more complicated.

Why would you not just send the unambiguous thing to start with? For instance, the plain JSON with IRI keys (or whatever other unambiguous keys). “Because it looks prettier” is a terrible reason I think. I mean you kinda say it here:

Why wouldn’t you just work with the expanded form exclusively then? Which as I understand is just plain JSON with IRI keys. I really don’t get the benefit of working with any other form. That would seem to require more processing and complexity for no benefit.

I mean, this just makes it sound not optional to me. If I need JSON-LD to be fully-featured, then that will be required eventually for something that I want to do.

These questions wouldn’t be an issue if terms were unambiguous enough to practically speaking never overlap.

This seems like a supremely bad choice and I don’t see how we can fix it backwards-compatibly.

It’s also not JSON-LD and isn’t what @trwnh seems to be proposing. Note that they wrote:

(Emphasis mine…)

Example 3 from the AP Rec looks like (in “plain JSON”):

{"@context": "https://www.w3.org/ns/activitystreams",
 "type": "Create",
 "id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
 "to": ["https://chatty.example/ben/"],
 "actor": "https://social.example/alyssa/",
 "object": {"type": "Note",
            "id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
            "attributedTo": "https://social.example/alyssa/",
            "to": ["https://chatty.example/ben/"],
            "content": "Say, did you finish reading that book I lent you?"}}

it looks like the following in expanded JSON-LD:

[
  {
    "https://www.w3.org/ns/activitystreams#actor": [
      {
        "@id": "https://social.example/alyssa/"
      }
    ],
    "@id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
    "https://www.w3.org/ns/activitystreams#object": [
      {
        "https://www.w3.org/ns/activitystreams#attributedTo": [
          {
            "@id": "https://social.example/alyssa/"
          }
        ],
        "https://www.w3.org/ns/activitystreams#content": [
          {
            "@value": "Say, did you finish reading that book I lent you?"
          }
        ],
        "@id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
        "https://www.w3.org/ns/activitystreams#to": [
          {
            "@id": "https://chatty.example/ben/"
          }
        ],
        "@type": [
          "https://www.w3.org/ns/activitystreams#Note"
        ]
      }
    ],
    "https://www.w3.org/ns/activitystreams#to": [
      {
        "@id": "https://chatty.example/ben/"
      }
    ],
    "@type": [
      "https://www.w3.org/ns/activitystreams#Create"
    ]
  }
]

I wouldn’t mind plain JSON with IRI keys, or any other unambiguous key but that expanded JSON-LD form looks awful and way more bloated than necessary :sweat_smile:.

IRI keys alone does not a JSON-LD make, but once you add the second restriction then yes, you get expanded form.

You could go with just IRI keys, but that still leaves ambiguity about whether a given URI value refers to an actual resource or to a string literal. So you end up needing to be explicit about @id and @value nodes, still.

I guess there’s also the JSON-LD option of “compacting against an empty context document”, which at least forces your document entrypoint to be a single object instead of an array.

Using the same expanded example as above (but with whitespace removed to make it look less intimidating):

[{"@id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
  "@type": ["https://www.w3.org/ns/activitystreams#Create"],
  "https://www.w3.org/ns/activitystreams#actor": [{"@id": "https://social.example/alyssa/"}],
  "https://www.w3.org/ns/activitystreams#object": [{
    "@id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
    "@type": ["https://www.w3.org/ns/activitystreams#Note"],
    "https://www.w3.org/ns/activitystreams#attributedTo": [{"@id": "https://social.example/alyssa/"}],
    "https://www.w3.org/ns/activitystreams#content": [{"@value": "Say, did you finish reading that book I lent you?"}],
    "https://www.w3.org/ns/activitystreams#to": [{"@id": "https://chatty.example/ben/"}]
  }],
  "https://www.w3.org/ns/activitystreams#to": [{"@id": "https://chatty.example/ben/"}]
}]

If you compacted this against an empty context, it would just remove the brackets from the start and end of every set that contains only one item, as well as coercing literal value nodes with no other metadata (such as @type or @language) into their literal form:

{ "@id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
  "@type": "https://www.w3.org/ns/activitystreams#Create",
  "https://www.w3.org/ns/activitystreams#actor": {"@id": "https://social.example/alyssa/"},
  "https://www.w3.org/ns/activitystreams#object": {
    "@id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
    "@type": "https://www.w3.org/ns/activitystreams#Note",
    "https://www.w3.org/ns/activitystreams#attributedTo": {"@id": "https://social.example/alyssa/"},
    "https://www.w3.org/ns/activitystreams#content": "Say, did you finish reading that book I lent you?",
    "https://www.w3.org/ns/activitystreams#to": {"@id": "https://chatty.example/ben/"}
  },
  "https://www.w3.org/ns/activitystreams#to": {"@id": "https://chatty.example/ben/"}
}

Arguably this looks slightly more familiar to the average JSON dev, but you have now introduced some value ambiguity which wasn’t there before: Instead of everything being an array set, now you have to deal with only some things being arrays. You now have to deal with each property-value being possibly one of the following:

  • Literal value
  • id node
  • value node
  • An array set containing any of these, heterogeneously

where previously you only had to deal with the case of “this is an array set containing either id nodes or value nodes”.

So the further you get away from expanded form, the more complexity starts to creep in. Dropping the second requirement to have every value be an array set of id nodes and value nodes? Well, you now have to deal with value ambiguity. Dropping the first requirement to have every key be an IRI? Well, now you have to deal with key ambiguity. If you think the @context stuff is too complicated, then hopefully at least you see why it’s necessary due to the tradeoffs made – familiarity won out over disambiguity. But at least you can continue to ignore it if you stick purely within the AS2 vocabulary and only use “official” terms, whose IRI mappings are consistent and you are mandated to not override them.

Anyone wanting to extend AS2 ends up having to at minimum accept value ambiguity, and if they want to use shorthand terms for their keys, then they are forcing everyone else to accept key ambiguity. This is, again, covered more in the discussion around FEP-e229.

1 Like

It could be!

Older W3C RDF stuff typically defaults to serving plain-text triples:

Newer stuff uses content negotiation via the HTTP Accept header:

The ActivityStreams 2.0 Terms page ActivityStreams 2.0 Terms (https://www.w3.org/ns/activitystreams) neglects to serve anything other than an HTML document that links out to the relevant specs that define each term, and conneg for JSON-LD leads to the context document at https://www.w3.org/ns/activitystreams.jsonld – I think this is due to the intra-group politics and conflicts when the specs were being developed, where some people were in favor of broader support and compatibility for the Linked Data Web, and other people were not in favor of this. I wasn’t there, so I’m not the best person to summarize it; I’m just going off of what has been said by others regarding this.

There are some half-solutions, at least, and FEP-e229 gets into them more. Basically, it boils down to “normalize everything” for LD-unaware people, and “consider leaving out context and prefixes for your extensions, just use plain IRIs” for the LD-aware people. This isn’t universal advice, because you might want to use some other context document that’s already “well-known” and part of some other spec, like Web Annotation or ODRL, but you’re going to have to be careful to avoid conflicts and overrides there depending on which order you declare the context documents.

1 Like

Idunno, I think the spec constrains plain-JSON production and parsing so as to make the JSON in question meet those two requirements that without the implementor understanding or even knowing they are functional requirements. The @context abstracts the former and the “string or array of strings” abstracts the latter, doesn’t it? I’m not trying to make value judgments but it does strike me as a little cheeky to tease a for volunteering to explain the tradeoffs in making the wire format compact JSON-LD

The way I see it, the URI value would just be one of any of the accepted options for a field, so it is just a string, albeit a bit of a long one.

The way I would like plain JSON AP to work would make it look like this:

{
  "@id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
  "@type": "https://www.w3.org/ns/activitystreams#Create",
  "https://www.w3.org/ns/activitystreams#actor": "https://social.example/alyssa/",
  "https://www.w3.org/ns/activitystreams#object": {
    "@id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
    "@type": "https://www.w3.org/ns/activitystreams#Note",
    "https://www.w3.org/ns/activitystreams#attributedTo": "https://social.example/alyssa/",
    "https://www.w3.org/ns/activitystreams#content": "Say, did you finish reading that book I lent you?",
    "https://www.w3.org/ns/activitystreams#to": "https://chatty.example/ben/"
  },
  "https://www.w3.org/ns/activitystreams#to": "https://chatty.example/ben/"
}

In my mind, the object key would be allowed to be only either an ID (i.e. only https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19 which would have to be fetched to get the content and other fields) or an object as displayed. Those would be the only two options. Dealing with that is simple enough and certainly simpler than JSON-LD I would say. Array values would be useful for the type field for extension types but otherwise I don’t see where arrays would be a possibility. I would also be okay with mandating only the ID always and requiring fetching the object separately, for simplicity, although it requires an extra request (if the object is big this might be reasonable anyway and if it’s small then it doesn’t matter much with that extra request anyway).

I don’t think so really - when I said “cannot be machine-read” I don’t mean that it is not in a machine-readable format. I just mean that if you encounter an unknown term, you cannot hope to somehow know how to handle it. That term would have to get support implemented and that cannot be done automatically on the fly by the machine inspecting the term’s documentation by following the IRI (at least, until we get some general artificial intelligence to do such a thing on the fly, but then we probably won’t have these problems).

It doesn’t matter whether you write your documentation for the term in plain text or in “machine readable” JSON, your implementation will still not know what to do with it. That’s why I don’t really see IRI keys as a big benefit. So the JSON could as well be this:

{
  "@id": "https://social.example/alyssa/posts/a29a6843-9feb-4c74-a7f7-081b9c9201d3",
  "@type": "activityStreamsCreate",
  "activityStreamsActor": "https://social.example/alyssa/",
  "activityStreamsObject": {
    "@id": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
    "@type": "activityStreamsNote",
    "activityStreamsAttributedto": "https://social.example/alyssa/",
    "activityStreamsContent": "Say, did you finish reading that book I lent you?",
    "activityStreamsTo": "https://chatty.example/ben/"
  },
  "activityStreamsTo": "https://chatty.example/ben/"
}

The AS2 spec requires compliance with JSON-LD compaction, not with expansion. For extensions, it says:

Activity Streams 2.0 implementations that wish to fully support extensions MUST support Compact URI expansion as defined by the JSON-LD specification.

Although it’s optional MUST, in my mind that statement at least strongly implies that supporting extensions with the current AS2 Rec requires JSON-LD expansion/processing (at least for full support, whatever that means. What is partial suppport?).

In my opinion, it doesn’t help that some AP thought leaders (“authorities”) claim that AP supports extension without JSON-LD processing.

2 Likes

In order to get that kind of output, you need at least a minimal @context to define which properties are expected to have @id values and which ones are expected to have @value values. By default when compacting, the compaction algorithm assumes @value. So in the example you provided,

"https://www.w3.org/ns/activitystreams#actor": "https://social.example/alyssa/",

is not immediately identifiable as a resource (@id). It could be the literal string URI.

This is why the AS2 context document has term definitions like this:

"as": "https://www.w3.org/ns/activitystreams#",
// ...
"actor": {
  "@id": "as:actor",
  "@type": "@id" // this is the crucial part!
}
// ...

So you know ahead-of-time that actor or its full IRI expansion https://www.w3.org/ns/activitystreams#actor has a value that is specifically an @id node.

One other note:

They’re a possibility anywhere and everywhere, unless explicitly called out as “functional” in the AS2 vocabulary. You can have multiple actor. You can have multiple object. You can have multiple attachment, tag, attributedTo, and so on. There are about 30-odd “functional” properties (i.e. max cardinality of 1) in the AS2 vocabulary, but everything else can be an array.

If you define a term with enough detail at its IRI, then you actually can make use of it. The only thing you can’t handle is side effects, but most properties don’t specify side-effects. At least in AP, side-effects are bound to the activity type – things like Create/Update/Delete/Add/Remove/Follow/Like/Announce are defined in-spec with the expected behavior of an ActivityPub server receiving them. But a generic JSON-LD browser (see https://browser.pub as something pretty close to this) could very well use the machine definition to show you additional human-useful information about any IRI – no matter whether it be a subject/object or a predicate. The most useful properties for doing so are rdfs:label and rdfs:comment, which contain natural language summarization of what the property is and what it means. Looking at the definitions for these properties, you see that they are self-describing: https://www.w3.org/2000/01/rdf-schema#

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

rdfs:comment a rdf:Property ;
	rdfs:isDefinedBy <http://www.w3.org/2000/01/rdf-schema#> ;
	rdfs:label "comment" ;
	rdfs:comment "A description of the subject resource." ;
	rdfs:domain rdfs:Resource ;
	rdfs:range rdfs:Literal .

rdfs:label a rdf:Property ;
	rdfs:isDefinedBy <http://www.w3.org/2000/01/rdf-schema#> ;
	rdfs:label "label" ;
	rdfs:comment "A human-readable name for the subject." ;
	rdfs:domain rdfs:Resource ;
	rdfs:range rdfs:Literal .

Imagine in your browser UI, you could render each statement where hovering over the relevant part of it could show you a tooltip that surfaces the label and/or comment. Say you have the following statement

<http://example.com/people/john> <http://example.com/vocab/knows/2> <http://example.com/people/sally>

Using natural language summation via properties like as:name or rdfs:label, you end up with the following rendering:

John knows Sally

It’s pretty clear how you might get more information on John or Sally – just dereference their IRI, right? Ideally there is both human-readable and machine-readable information there, available in multiple formats as you please. But you can do the same thing for the predicate “knows”! So you hover your mouse over the word “knows” in that rendered statement, and a little tooltip appears with the text from rdfs:comment that tells you exactly what “knows” means:

John knows Sally
     ^^^^^
Indicates that the subject is familiar with the object.

Of course, other vocabularies might have different definitions for what “knows” means. You might even have multiple definitions within the same vocabulary! This depends on the IRI used for the predicate. The LD Web browser would be able to dynamically show you the rdfs:comment of each IRI:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.com/vocab/knows/1> rdfs:comment "Indicates that the subject is aware of the object." .
<http://example.com/vocab/knows/2> rdfs:comment "Indicates that the subject is familiar with the object." .
<http://example.com/vocab/knows/3> rdfs:comment "Indicates that the subject is having sexual intercourse with the object." .

(definitions taken from Oxford Languages as returned by a google search for “define knows” at the time of writing this post.)


Here’s a concrete example: load up one of my latest statuses in browser.pub and you get this: https://browser.pub/https://mastodon.social/@trwnh/113178090250935630

Now click to expand the “more” details element.

Now hover over “cc”.

You should see the documentation for as:cc as sourced from the AS2 vocabulary. This information is likely predefined within the browser.pub application, but if it were served in machine-readable format at https://www.w3.org/ns/activitystreams#cc, then it could be pulled automatically and cached for later use. This could be done for any foreign vocabulary or ontology which you don’t know ahead-of-time.

Now, this all might not be necessary if you “read the spec” ahead-of-time. But the cool thing about LD is that if you define it using linked data, then you don’t actually have to read any specs to know what a single property means. Specs would only be needed as overviews and to describe side-effects. The machine-readable definition of that property can serve as a specification to concisely describe everything you’d want to know about that property. And you can even serve a machine-readable and human-readable definition in the same resource if you just mark up your HTML with a little extra RDFa. More time writing and describing specs in a concisely bounded manner translates into less time reading specs for everyone else.

1 Like

I take that statement to mean that the AS2 authors wanted implementers to be able to handle not just term, but also ex:term and http://ns.example/term in equal measure. “Full support” means handling the extension correctly. “Partial support” means fragile parsing based on expecting a certain presentation.

For example, consider sensitive before it was adopted into the official context document. If an implementation can handle sensitive but not as:sensitive or https://www.w3.org/ns/activitystreams#sensitive, then it is not fully supporting the extension in a JSON-LD sense. It is only partially supporting the extension. This is less of an issue with that particular term now that it has been adopted into the official context and it would be non-compliant to express it as anything other than sensitive. I suppose it could be an issue for old documents that haven’t been updated.

Same logic applies to something like toot:discoverable. The following are all valid AS2:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Person",
  "http://joinmastodon.org/ns#discoverable": true
}
{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "toot": "http://joinmastodon.org/ns#"
    }
  ],
  "type": "Person",
  "toot:discoverable": true
}
{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "discoverable": "http://joinmastodon.org/ns#discoverable"
    }
  ],
  "type": "Person",
  "discoverable": true
}

The “problem” with “full support” as it were, is that plain JSON consumers have to account for every single one of these possibilities and more. There are potentially infinite / unbounded possibilities for what any given producer will produce. I could call the prefix toot, or I could call it mastodon, or I could call it something else entirely. I could call the term discoverable, or mastodonDiscoverable, or discoverableWithinMastodon, or something else entirely. If, as the AS2 spec says, you MUST support CURIe, then you should be able to handle whatever case I throw at you. It’s not trivial for plain JSON consumers, and it’s not consistent for plain JSON producers.

ActivityStreams supports extension without JSON-LD processing… in a very limited case where an extension term gets adopted or promoted into the “mainline” context document as described in Process for Including Extensions in Activity Streams 2.0 and pretty much in no other cases. Well, there’s a way to get the same effect, at least – just force everyone to implicitly share the same context as you. This is what Mastodon does, right? It’s certainly what Pixelfed does. I wouldn’t be surprised if several other implementers neglect to handle something like http://joinmastodon.org/ns#discoverable or toot:discoverable or mastodon:discoverable or mastodonDiscoverable or discoverableWithinMastodon or so on. They are all equivalent depending on the term or prefix definitions in @context. But this is probably not something that most “plain JSON” implementations bother to consider. So I guess they’re technically not complying with that MUST requirement. But it matters less to them because JSON-LD and @context is more of a cargo cult than something they actually understand. I’ve seen implementers naively try to check for the presence of https://www.w3.org/ns/activitystreams in the @context when they absolutely shouldn’t be doing that at all.

I’m not sure what you mean by an “optional MUST”, but I always interpreted that sentence to mean, “if you are using extensions in a way that breaks JSON-LD expansion, you’re not really extending this polyglot spec properly and shouldn’t expect smooth interop with JSON-LD implementations” . So not literally “you must expand everyone else’s messages to fully be doing it right”, so much as “don’t make extensions that are less polyglot than core” and “don’t implement polyglot extension specs in a way that makes string-or-array fail for everyone else”. But hey maybe that’s just my kooky interpretation.

I think it’s easy in hindsight to say the impossible task of wording a polyglot spec was imperfectly implemented, but i’m not sure how much “authorities” should be expected to be even better at walking that tightrope than the spec itself is? I have trouble imagining what else the party line would be that doesn’t make an overlapping group of people just as cranky. It feels very “damned if you do, damned if you don’t” because anything that people who’ve studied the protocol closely (yourself included) tell someone who doesn’t want to use JSON-LD is going to be unsatisfactory. I sympathize and I don’t think your understanding of the protocol differs that much from mine, but I’m not sure what any of us should say differently on the subject? It is, at its core, a JSON-LD spec written in a “best effort” way to be forgiving of the JSON-only implementer, and many of those workarounds and polyglot ideas have aged badly in the intervening 5 years, particularly since stable, mature, hardened JSON-LD tooling in all major languages has failed to materialize magically without serious investment…

Sidenote, I’m also not sure who these “authorities” are-- am I an authority? Are original members of the 2019 WG? As a matter of W3C policy, anyone active in the CG is equally entitled to speak for the spec, as the CG maintains it and oversees implementation feedback. If you think the authorities are misrepresenting reality in official communications, you’re welcome to join the new Task Force overhauling the websites linked out from the spec. In an unofficial capacity, you can just disagree with authorities anywhere you want. They’re just humans trying to help adoption of a thing they love-- I think ulterior or even pecuniary motives are extremely rare here.

No, because in this imagined plain JSON version of AP, JSON-LD would not be used so there would be no @context or @id or @value or other JSON-LD stuff. Fields would have pre-defined types in the spec and you wouldn’t use a context or anything like that to decide what fields mean. So there would be no fields that could be either 1 or many things.

This is an interesting bit of tech, but ultimately the whole idea of presenting generic ActivityPub objects “natively” or even the idea of a client-to-server API has clearly failed. If you ask me, it is simply not feasible to build an interface for generic ActivityPub objects like this. Sure, this might be a quirky technical thing that some hardcore power users may want to use, but 99.9% of users do not like this, do not want it and don’t even understand it. We should not hamper the fediverse by trying to support such esoteric use cases. We should build with the 99% in mind. This is hard as we do not have any of those people involved in building the solution and we never will have. But we need to keep it in mind.

In practice (AFAIK), most implementations do not store ActivityPub objects natively but use a database designed specifically for the implementation’s use case. ActivityPub objects are only used for communciation between implementations. With that in mind, unknown fields cannot be used for any practical purpose unless explicitly supported by the implementation’s own database (or if it somehow displays the fields in a generic way, but again, nobody does this and nobody wants this).

I mean, case in point. While technically interesting and certainly some power users might enjoy this almost “brutalist” design, no normal users would like this and would just be majorly confused. We need to remember that almost all users are (relatively to us) technically illiterate and not interested in learning how an interface works - the interface needs to be intuitive to such a degree that the user does not need to think. At least, we need this if we ever want the fediverse to present a serious alternative to corporate centralized social media (because they sure will provide such intuitive interfaces and people will flock to them if we don’t do the same).

I appreciate your enthusiasm about JSON-LD and the extensability it provides but I think that JSON-LD is a technical solution to a theoretical problem that is so far removed from actual user needs that it achieves nothing more than make the whole system more complicated. It seems to build for the 1% of power users and esoteric use cases rather than the 99% of normal users.

Maybe we should acknowledge the fact that implementers are reluctant to use JSON-LD. Maybe we should take that as a sign that JSON-LD is not suited for this purpose. Maybe it is technically suited, but perhaps the cognitive cost is too high or the benefits do not justify the complexity. If implementers are not interested in understanding or supporting JSON-LD, then perhaps we ought to reconsider whether it really should be used for a federated protocol at all.

I dunno, the quoted sentence comes off somewhat arrogant/elitist. It seems akin to “if only implementers actually took the time to understand and implement JSON-LD properly then we wouldn’t have these problems!”. This is once again about usability but this time the developers are the writers of the ActivityPub spec and the users are the developers of the implementations using ActivityPub.

If all your users are not using a feature and not interested in that feature, then you should not blame the users - the feature is clearly bad. This feels like a product manager saying “if only the users used the feature properly, everything would work perfectly fine! What are they complaining about?”

1 Like