Best-practices for AP vocabulary extensions?

naturzukunft · May 29, 2023, 1:31pm

It might be interesting to compact your JSON-LD so it is compliant with AP and then post that. It would highlight ambiguous fields like “description” and maybe trigger some discussions about how a plain JSON consumer would (or would not correctly) interpret it.

sure ! But in the moment not my priority. I was playing with json-ld framing, but was not able to get an “compatible” , “lightweight” json as other expect. It would be intressting, when anybody that knows that json-ld framing thing define the options to be set to be some kind of interoperable

The other thing is, that if i want to communicate with other instances i’ve to know what types in which format they support. then i can put an adapter in between. See this discussion

stevebate · May 29, 2023, 3:24pm

Vocata is doing it so you may want to take a look at that. I’ve been able to use Vocata (with a few mods not related to JSON-LD) to interact with Mastodon, for example.

I don’t know the specifics of what you’re trying to do, but it will probably be more complex than someone giving you a list of supported message types and fields (which is not likely anyway in most cases ). The expected behavior and side-effects related to that information (assuming we’re still talking about extensions rather than core AP messages) are also going to be needed (at the very least).

eprodrom · June 21, 2023, 1:53am

I’m writing a note on this topic for the SocialCG. Here are, I think, the big steps:

Use a permanent namespace and context URL.
Create the context document, and write up the standard for discussion in a process like FEP.
Get “enough” implementers; ideally at least 2 independent implementations.
(Optional) At this point, people can just add “@context”: [“ActivityStreams 2.0 Terms”, “https://your.example/context/”], and it should just work. But if it’s getting tiresome adding the extra context URL, the SocialCG can approve adding the namespace and terms to the main AS2 context doc, in which case future developers only need to use [“ActivityStreams 2.0 Terms”].

I think the hardest part is making a good design that other people actually use. The rest is just bookkeeping!

aschrijver · June 21, 2023, 4:52am

That hardest part is the community / social part. Usually when people get to develop an extension for their app they a) want to move quick, get on with coding and b) they are the first / only project with the need for the extension. Taking the time to do the “formalities” and chores doesn’t seem worthwhile, unless it is real low barrier to do so. And then later on, with an established base, the custom extension is harder to generalize as it is used in production. There still isn’t much incentive to collaborate on that as a) “Very busy to keep it all going” and b) “My app works fine”.

eprodrom · June 27, 2023, 5:20pm

I think the very hard part is if you rush through deploying it for your one codebase, and three other implementers take the time to agree on something very different. Now you’re committed to your custom extension that nobody else uses!

melvincarvalho · June 29, 2023, 12:10am

Yes, but vocabs were precisely designed to solve that problem. Such that different groups, different projects, different use cases can define their own terms, and the fact that there are different vocabularies enable them to work together, interoperate and resolve naming clashes.

What inadvertently happened in activitypub, and this is partly due to a less than comprehensive understanding, and partly due to long-standing bugs, is that, activity pub diverged from Linked Data in such a way, that the context became a point of centralization. Which is the complete inversion of the design of the thing.

It can be partially mitigated with the extension process. But we need to keep an eye on centralized artifacts, such as “protected” or “sacred” terms, as we saw with alsoKnownAs etc. We need to add controls so that terms dont get rail roaded into the context in a centralized way. But, as long as we are mindful of the risk, it should be fine.

aschrijver · June 29, 2023, 8:32am

But isn’t this also the road towards open standards complexity? Going deep into OWL territory, the theoretic foundation of the Semantic Web (“we can map universal meaning”) , or even the trendline of the Solid project to create every more intricate specs to hammer things down. I created a diagram some time ago that I posted several times before:

Arguably AP diverged because most devs wanted to avoid the - perceived or not - complexity of Linked Data, and went with the KISS approach of plain old JSON.

The extension process should be a practical approach that brings the best of both worlds.

melvincarvalho · June 29, 2023, 8:48am

Solid, originally a concise 2-page specification, has admittedly become more complex over time, often due to the incorporation of specific favored features.

The presented diagram offers a general direction, although some elements may need rearrangement, considering the simplicity and effectiveness of plain JSON with hyperlinks.

ActivityPub has inherited the complexities of linked data without fully capitalizing on its advantages. Despite its current intricacy, it’s what we have at our disposal and it does cater to certain use cases.

The use of context in its current form is somewhat redundant. A @specification field directing to examples and documentation could have served the same purpose without becoming a point of centralization, which seems to be the trend in other specifications. We also need a kind of framework for creating vocabularies so that it is an easy devX.

A better understanding and implementation of linked data would have yielded a simpler, more extensible, and interoperable system. For instance, plain JSON and LD could coexist without extra overhead. A term in plain JSON would function like any other web API, whereas a namespaced term would provide access to documentation, examples, types, etc. This approach allows each project to develop its unique aspects while common patterns gradually emerge and are adopted.

A revision of the context and vocabulary in the next iteration of ActivityPub could potentially bring about the discussed features and establish a template for creating new vocabularies in alignment with the original linked data design concept. There have been deviations from this standard over time, and it will indeed be a challenge for those who deviated to revert back. Nevertheless, I look forward to seeing where we land once the outstanding issues have been addressed.

trwnh · June 29, 2023, 12:40pm

the only part of the specs that does this is the part of AS2-Core that says that AS2 docs MUST be compacted against the normative context, and that terms in the normative context cannot be overridden by any other term definitions.

presumably the intention was to allow for “activitystreams 2.0” support without having to support LD/RDF/etc, which is as you say “diverged”. but intentionally so. whether that intention was justified is another matter altogether… the decision could have fallen the other way entirely and had it remain LD.

melvincarvalho · June 29, 2023, 2:28pm

The persistent issue we’ve been discussing is indeed just a bug that, over time, has been accepted as part of the system. The lack of a proper vocabulary in ActivityPub has led to an over-reliance on the context, which serves a different purpose. This has made vocabulary extension, which should have been straightforward, more challenging and confusing. As a result, the context, which ideally should be versioned, has become a point of centralization.

There may come a time when it would be more beneficial to rectify these bugs, allow ActivityPub to be extended through vocabularies, and phase out the original context. This could pave the way for a new generation of interoperable systems and applications that remain backward compatible with the first generation. But lets see what happens with the bug fixing in the 2nd half of 2023, the problem may solve itself.

trwnh · June 29, 2023, 2:43pm

there is a vocabulary, though… you can “extend” it with LD but your extensions should not override as2. the problem is that implementations want to do extensions without LD. in other words, they want the same “plain JSON fallback” behavior for extensions as there is for the normative context. this is a big ask – it is a big ask primarily because there is no “normative extension context” for the fediverse. LD doesn’t operate on such principles, generally. hence, the desire or need for “best practices for extensions”, which off the top of my head are probably things like “use the full IRI because you can’t ever expect a consistent shorthand”.

trwnh · March 28, 2024, 10:31am

draft of a fep i’m writing up currently…

FEP-e229: Best practices for extensibility

Summary

Current popular implementations of ActivityPub do not handle extensibility very well. This FEP seeks to highlight some basic requirements for extensibility, and offer suggested advice to implementers who wish to avoid compatibility issues, particularly for LD-unaware consumers.

Recommendations

Ignore JSON-LD context if you don’t understand it

LD-unaware consumers MUST NOT attempt naive string comparison against the JSON-LD context declaration. There are several possible reasons why a received document might be valid AS2 but not declare a @context. One possibility is that the declared Content-Type is application/activity+json and the producer is LD-unaware. Another possibility is that the producer is LD-aware, but using a different context IRI that defines the same terms. Yet another possibility is that the producer is embedding inline term definitions. Regardless of the reason, either the consumer understands it or does not understand it.

Normalize types into type-sets

It is an unfortunate and erroneous belief that objects in [AS2-Core] or [AP] can have only one type. This assumption breaks proper extensibility. Wherever a generic ActivityStreams consumer needs to know whether it is dealing with an [AS2-Vocab] type or [AS2-Core] mechanism like Collections, it cannot do so unless that type is present in the type set. However, extension vocabularies may need to declare additional types as interfaces that have been fulfilled by the given object. For this reason, LD-unaware consumers doing type checks need to take care to normalize type into a set, and check that their desired type is contained within that set.

For example, "type": "Collection" would be normalized into "type": ["Collection"].

Consider producing documents compacted against only the AS2 context document

Since JSON-LD expanded form is unambiguous, it may be a good idea to use it wherever possible. This slightly reduces human readability due to the additional verbosity, but it results in exactly one possible representation of your extension data. LD-unaware consumers will possibly have to learn the structure of JSON-LD expanded form. LD-aware consumers can “simply” re-compact the document against any additional contexts they understand.

For example, consider the current use of “profile fields” prior to [FEP-fb2a] “Actor metadata”. Ignoring that Mastodon currently uses sc as a term prefix for an incorrect definition, such a term prefix would be unnecessary if partially-uncompacted JSON-LD was used:

{
	"@context": "https://www.w3.org/ns/activitystreams",
	"id": "https://example.com/~alyssa",
	"type": "Person",
	"name": "Alyssa P. Hacker",
	"attachment": [
		{
			"type": "http://schema.org/PropertyValue",
			"http://schema.org/name": "Pronouns",
			"http://schema.org/value": "she/her"
		}
	]
}

Avoid unnecessary term prefixes

Compact IRI prefixes can have multiple terms map to the same prefix, depending on which context the producer uses for compaction. For example, say we have a prefix for http://example.com/. You may encounter some documents with example:term, some documents with ex:term, some documents with http://example.com/term, and so on. LD-aware consumers can “simply” apply JSON-LD expansion to make all terms unambiguous, and then apply JSON-LD compaction against their local preferred context. LD-unaware consumers instead have to deal with unbounded possible equivalent terms, and will either have to add support for them on a case-by-case basis, or reinvent and reimplement JSON-LD expansion. This issue can be ameliorated by taking care to reuse existing conventional prefixes. An example of this is the [RDFa-Context] “initial context”.

Only declare IRIs for terms that are expected to be shared

By default, the ActivityStreams context document declares @vocab to be _, meaning that the default vocabulary namespace is the blank namespace. Extension types and properties can be implemented as-is by LD-unaware producers, and the JSON-LD expansion algorithm will expand term to _:term. This may be sufficient for experimental or implementation-specific terms that are not expected to be used by anyone else.

References

[AP] Christine Lemmer Webber, Jessica Tallon, ActivityPub, 2018
[AS2-Core] James M Snell, Evan Prodromou, Activity Streams 2.0, 2017
[AS2-Vocab] James M Snell, Evan Prodromou, Activity Vocabulary, 2017
[FEP-fb2a] a, FEP-fb2a: Actor metadata, 2022
[RDFa-Context] Ivan Herman, RDFa Core Initial Context, 2011

Copyright

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

To the extent possible under law, the authors of this Fediverse Enhancement Proposal have waived all copyright and related or neighboring rights to this work.

trwnh · March 28, 2024, 10:32am

i’d like to hear if anyone else has more ideas, btw!

aschrijver · March 28, 2024, 10:59am

Great to see this being fepped out, @trwnh! One thing I wonder about is the naming… if this becomes FINAL, and then we find another best-practice, should that be named a “More best-practices…”, followed by a “Yet more best-practices…” FEP.

Maybe there should be a FEP per best-practice, with clear titles indicating the practice, and as @stevebate suggested a categorization of FEP’s with a “Best-practice” being one of them.

stevebate · March 28, 2024, 12:35pm

I like the direction this draft is going. For this section, maybe the advice could mirror the AS2 Core recommendation to assume the normative AS2 context if it’s not included in the @context (versus ignoring context completely).

AFAIK, Mastodon requires the normative context URL to be in an AP activity @context or it will consider the activity to be invalid. If that’s correct, this unnecessarily reduces interoperability with conformant AP/AS2 publishers (that might not include any @context and use the AS2 content type instead). This isn’t a comment about your draft, but I wonder how developers can be convinced to change their implementations to use this practice.

It might be useful to organize the practices by publisher/consumer roles. For example, the recommended practice for producers might be to always include the normative context URI and for consumers to not assume it is there when an AS2 content type is used.

Would it make sense to include advice for documenting extensions (structure, behavior, required/optional properties, functional/nonfunctional properties, cardinality, valid value sets/enums, etc.) and for versioning extensions?

The advice for partially compacted terms is not bad, in general, but if someone followed that advice today (e.g. for the ubiquitous toot context), most servers would not accept the messages.

Even just a categorization of informational versus “spec/rec track” (the latter being part of your process substrate recommendation) would be useful. I think informational FEPs should typically never become FINAL in the way the FEP process defines that state.

eprodrom · March 28, 2024, 1:37pm

It’s a really bad idea to inject terms into the default namespace. It’s a recipe for conflicting terms. Please don’t do it. All ActivityPub documents are meant to be shared – with clients or with other servers.

Also, using a junk drawer namespace based on your application (“https://myprogram.example/ns#someTerm”) is an anti-pattern. Throwing a bunch of terms together into a namespace just because your app uses all of them is going to make it hard to standardize them later. You can easily set up separate namespaces for each group of related terms, like “https://myprogram.example/ns/backgroundColours#” and “https://myprogram.example/ns/jobTypes#”.

trwnh · March 28, 2024, 9:05pm

I’ll note this as an additional best practice for people using an application domain as their prefix

I looked more into this and found a whole bunch of discussion about it:

Use of `"@vocab": "_:"` in ActivityStreams 2.0 (at least) · Issue #183 · w3c/json-ld-syntax · GitHub
Consider obsoleting use of blank nodes for properties and "generalized RDF" · Issue #37 · w3c/json-ld-syntax · GitHub
warning if data uses "@vocab": "_:" fallback · Issue #4 · w3c/activitystreams-testing · GitHub

The consensus seems to be that it is archaic and may be removed in a future version of JSON-LD. Based on this, I will probably remove or at least reword that section to emphasize that this is for LD-unaware producers. The purpose of including it in the first place was to advise against kludging something that “looks like LD” from an application that fundamentally doesn’t understand LD. Or in other words, this is more for the case where there is no @context and the Content-Type is application/activity+json, or the only context declared is the AS2 context document. (The fundamental issue is really that there is a limit to how friendly you can make things for LD-unaware applications without forcing them to reinvent LD from first principles.)

trwnh · March 28, 2024, 9:16pm

Good point. There’s definitely some work that needs to be done by existing implementations in order to be less fragile, and I should probably call those out explicitly. As you say,

I’m thinking of the following classifications:

LD-aware producer
LD-aware consumer
LD-unaware consumer
LD-unaware producer
a secret fifth option for current implementers who are neither aware nor unaware, but rather handle things incorrectly based on their (mis)interpretation of LD

Probably? It would be a separate h2 section, I think. The overall structure of the draft is not final at all, as I’m still trying to gather best practices before trying to come up with a structure.