FEP-e229: Best practices for extensibility

silverpill · April 2, 2024, 6:55pm

Hello!

This is a discussion thread for the proposed FEP-e229: Best practices for extensibility.
Please use this thread to discuss the proposed FEP and any potential problems
or improvements that can be addressed.

Summary

Current popular implementations of ActivityPub do not handle extensibility very well. This FEP seeks to highlight some basic requirements for extensibility, and offer suggested advice to implementers who wish to avoid compatibility issues, particularly for LD-unaware consumers.

tesaguri · April 3, 2024, 3:28am

Posting this reply to https://socialhub.activitypub.rocks/t/best-practices-for-ap-vocabulary-extensions/3162/32 here because I’m late to the party…

I think it would be good to also mention the versioned contexts of Activity Streams as an example use case.

Very minor nitpick, but doen’t JSON-LD spec also call a value of an entry like "type": "Collection" a “set”? I think the proper term for a plain-JSON syntactic sugar like that is an “array”.

Also, I think the guidance on the handling of sets is not specific to type, but also applicable to any non-functional terms, including other core Activity Streams terms. For example, the assumption that the value of the url term is always a single JSON string would be incompatible with FEP-fffd portable objects.

Best-practices for AP vocabulary extensions?

{
	"@context": "https://www.w3.org/ns/activitystreams",
	"id": "https://example.com/~alyssa",
	"type": "Person",
	"name": "Alyssa P. Hacker",
	"attachment": [
		{
			"type": "http://schema.org/PropertyValue",
			"http://schema.org/name": "Pronouns",
			"http://schema.org/value": "she/her"
		}
	]
}

Another minor nitpick: This also expands the value of the attachment term to an array, which is in violation of the AS2 requirement IIUC.

(Personally speaking, I don’t like the fact that the spec requires compacting values of the attachment term that way, but that is entirely different problem.)

Also, the name term used by existing implementations refers to as:name, not sc:name, so you should never “expand” the term that way.

As for the use of expanded terms, I think the advisory should be “for new extensions only”, so that it won’t break existing extension terms. While that would leave existing extensions fragile to name collisions, we could at least bound the risk of name collisions to existing extension terms (and future addition to the normative context, if any). After all, the point of the FEP is to reduce interoperability issues, not to cause another churn to the ecosystem.

So the FEP has added a warning against blank node terms since the initial draft. But the description still sounds as if there are a few cases where they are justified, and I don’t think it has fully resolved the concern.

As Evan has pointed out, every ActivityPub document is meant to be shared. I think that not only means sharing across implementations, but also sharing among different documents produced by a same implementation.

IIUC, blank nodes from different datasets never denote a same entity, even if they share a lexically “same” blank node identifier in the surface syntax (cf. https://www.w3.org/TR/rdf11-concepts#h-note-3), which makes the properties almost nonsense (at least without any entailment regime? I don’t know.).

I guess the default @vocab is merely a “fool-proof” against accidental lost of JSON entries during JSON-LD processing algorithms and is not meant to be intentionally used, is it? After all, coining a disposable IRI isn’t particularly hard task (you don’t even need to own a domain in order to make a urn:uuid: or a tag: IRI), so there is no reason to avoid it, even for experimental purposes.

Those whose value is a node on the graph. In uncompacted form, these would use @id.

I feel this might introduce another problem that consumers would need to use different logics for the core Activity Streams terms (compacted) and extension properties (expanded) to determine if the value is a reference to an external node or an embedded node.

Instead, couldn’t we specify the @type for the property in the context without defining a shorthand, like the following?

{
  "@context": [
    {
      "http://example.com/idProperty": {
        "@type": "@id"
      }
    },
    "https://www.w3.org/ns/activitystreams"
  ],
  "http://example.com/valueProperty": "some string or number or boolean",
  "http://example.com/idProperty": "https://example.com/some-resource"
}

You might wonder if this is a valid JSON-LD document. Well, I’m not an expert of the matrer, but the Create Term Definition algorithm of JSON-LD 1.1 Processing Algorithms and API has the following step:

Otherwise, term is an IRI or blank node identifier. Set the IRI mapping of definition to term.

So I think a term can also be a (non-compact) IRI. At least, the JSON-LD Playground doesn’t complain to it:

https://json-ld.org/playground/#startTab=tab-expanded&json-ld={"%40context"%3A[{"http%3A%2F%2Fexample.com%2FidProperty"%3A{"%40type"%3A"%40id"}}%2C"https%3A%2F%2Fwww.w3.org%2Fns%2Factivitystreams"]%2C"http%3A%2F%2Fexample.com%2FvalueProperty"%3A"some%20string%20or%20number%20or%20boolean"%2C"http%3A%2F%2Fexample.com%2FidProperty"%3A"https%3A%2F%2Fexample.com%2Fsome-resource"}

By the way, the partially-expanded form should use the id term instead of the @id keyword, since the normative context aliases the former keyword to the latter term. Either way, however, I would argue that it is yet another source of ambiguity.

The following are random ideas of additional topics:

Prefer `"@container": "@set"` for non-functional property terms

Just the same reason as you described in the following issue:

github.com/w3c/activitystreams

Force non-functional properties to always be arrays

opened 02:29PM - 23 Apr 23 UTC

closed 04:48PM - 13 Mar 24 UTC

trwnh

Next version

Please Indicate One: * [ ] Editorial * [ ] Question * [x] Feedback * [ …] Blocking Issue * [ ] Non-Blocking Issue Please Describe the Issue: --- ### Proposed change Add `"@container": "@set"` to non-functional properties in order to force them to be arrays even after compaction. ### Motivation This would save plain JSON consumers from having to account for single-value arrays being expressed as single string values, forcing an array during compaction. Consumers would then only have to account for IRI representations vs inlined representations. It would also make it clearer which properties are functional and which ones are not. Currently, many implementations wrongly assume that certain properties are functional and can only have a single value, when in reality they are actually sets that only *often* have a single value inside them. In particular, `items` vs `orderedItems` has recently come up as a source of confusion. `orderedItems` is defined as a `@container: @list` and therefore is always an array. But `items` is not defined explicitly as a `@container: @set`, which makes it valid to use a single value instead of a single-value array. Consequently, GoToSocial (relying on Go-Fed) fell into the mistake of assuming that `orderedItems` could likewise be a single value, because `items` can be a single value. Similarly commonly, several implementations fail to account for multiple `type` values. This leads to situations where implementations include an extension type that somewhat overlaps with a core type, but then *don't* include the core type as they are normatively required to do in AS2-Core. The following language is repeated 3 times in AS2-Core sections about modeling: > When an implementation uses an extension type that overlaps with a core vocabulary type, the implementation MUST also specify the core vocabulary type. I believe it would lead to greater conceptual consistency to do away with the ambiguity around single values vs single-value sets. The default JSON-LD behavior of coercing single-value sets into single values is not very useful and mainly serves as a pain point, as yet another case to handle for non-LD-aware processors.

stevebate · April 3, 2024, 4:31am

The blank predicates are not serialized into RDF so there’s no dataset issue for those. For example, the following will result in an empty RDF graph.

{
  "@context": {
    "@vocab": "_:"
  },
  "foo": "bar"
}

The AP @vocab protects against loss during JSON-LD expansion and compaction but doesn’t protect against loss when serializing to RDF. Like you mentioned, a non-blank prefix (e.g., urn:x-activitypub:) would be a more “fool-proof” @vocab prefix.

eprodrom · April 5, 2024, 7:31pm

I’d like to see a reference to the SocialCG’s Extension Policy. It’d be good to note that there is a possibility of extensions becoming part of the main AS2 context, if certain easy conditions are met (as listed in that document).

silverpill · September 24, 2024, 7:58pm

A FEP like this definitely should exist, but I don’t agree with its recommendations for LD-unaware implementations.

Normalize types into type-sets

I’m hesitant to implement this because multi-typing is not used in Fediverse today and the cost of supporting it is not negligible. Furthemore, I don’t think this a good way to identify a core type. Duck typing also works and doesn’t require changing existing implementations.

(this also applies to Formally define "activity" and "actor" at the spec level · Issue #469 · w3c/activitypub · GitHub)

Declare IRIs for terms that are expected to be shared

Do you mean using full IRIs for every non-AS2 term? I think this is not necessary because in practice conflicts don’t happen and I don’t expect that to change anytime soon.

Also, the explanation of this recommendation is not very helpful to JSON-LD unaware person because it is written in a way only JSON-LD expert could understand.

FEP-e229: Best practices for extensibility

Prefer "@container": "@set" for non-functional property terms

Prefer `"@container": "@set"` for non-functional property terms