ActivityPub: A Linked Data spec or JSON spec with Linked Data profile?

I am a bit lazy to collect many references to past discussions on how-to use Linked Data appropriately for the definition of ActivityPub extensions (formats + behavior) to the (pluggable?) ActivityPub protocol. But I’ll cross-ref Linked Data: Undersold, Overpromised?

This topic is more in the context of providing rock-solid Integration Guidance and organize a Standards Process that Guarantees an Open Decentralized Ecosystem on the Fediverse.

Linked Data based extension mechanism

Discussions have been endless on how to use Linked Data (JSON-LD) appropriately to define AP vocabulary extension, but also the behaviors, business logic, message exchange patterns that come with them.

Linked Data support is a sensitive opinionated area. Many devs do not like LD (to put it mildly) and are using plain JSON in their impls, while others feel that LD is a must-have for future versatility of the protocol in terms of all the use cases that AS/AP is good for. However, these benefits of LD on the protocol level (rather than having LD elsewhere in ones application) for the extension mechanisms have never been clearly expressed AFAIK.

I mentioned on fedidevs chat in context of discussing using JSON Schema for vocab validation, triggered by @stevebate (highlight mine):

Maybe this is a no-go area, given that we want to retain backwards-compat, but suppose we got rid of JSON-LD in favor of some other kind of extension mechanism, what’d we missing out on?

In other words: Is LD worth all the effort and overcoming the reluctance of devs to broadly adopt it? Because broad adoption is a requirement for a resilient future AS/AP Fediverse.

Alternative non-Linked Data extension mechanism?

Discussion went on (follow the chat links to catch up) and led me to ponder

What if, instead of…

  • ActivityPub is a Linked Data spec, but you can also treat it as JSON

We redefine as:

  • ActivityPub is JSON, with a well-defined extension mechanism (e.g. using JSON-Schema + living docs)
  • ActivityPub has an additional profile that allows you to make it Linked Data, and do untold magic. (i.e. go the extra mile)

Could we formulate a non-Linked Data extension mechanism that is an addition to current specs - so that it can be introduced without breaking backwards-compatibility, and still offer everyone that wants to venture into Linked Data realms the opportunity to do so?

My thinking is we could, if we…

  • Offered something along the lines of Compliance Profiles.
  • Provide means of message format definition and validation (e.g. with JSON schema)
    • Where robust namespacing of extensions is a key requirement
  • Have discoverable place where behavior etc. docs of an extension are to be found.
  • Don’t make it too hard for LD audience to parse msgs…
    • Though: Current spec may see them faced with non-LD custom msgs already.
4 Likes

I would stress that I have nothing against Linked Data, and am even in favor of us finding a good way to move forward with it. This topic is to have open discussion on pros and cons, but also to hopefully find a way where we finally put this subject behind us with a DONE completion status :hugs:

In light of the discussion @helge mentioned:

I think having someone write an explainer of “behavioral extension” vs “vocabulary extension” would already help (better naming might be possible).

My recommended example for “behavioral extension” are Emoji Reactions and the various complications. Can I :cow: and :cow2: the same post? I think I should!

And @codenamedmitri mentioned advantages provided by Linked Data:

@aschrijver: The “what would we be missing out of” part I have never heard a good answer on.

oh, thats easy. we’d miss out on 1) namespacing, 2) self-documenting terms. (there’s a third feature, where it can be used for partial translation between data formats, but that gets rarely used)

A Linked Data spec or JSON spec with Linked Data profile?

what if… we redefine as:… ActivityPub has an additional profile that allows you to make it Linked Data

I think this is framed deceptively. My counter-history of the JSON/JSON-LD compromise would be this:

  1. JSON-LD is already a profile of JSON that exists to give JSON-only devs a set of [honestly, quite hard-to-understand] constraints within which their data is less likely to break “JSON-LD parsers”
    • I would argue the biggest problems here aren’t even specific to AP but are documentation and education and tooling shortcomings of JSON-LD qua JSON profile.
  2. AP is already a loose, robust protocol built around a data model subtly steeped in LD/RDF assumptions about how to build a global graph from heterogenous data (according to a particularly interpretation of the Robustness principle, I might add).
    • I agree with Helge and Christine that the HTTP and instance/server assumptions which crept in over time via the Mastodon-centric Fediverse have hardened into unfortunately widespread dogmata; these assumptions were deliberately kept out of the spec to keep the data architecturally flexible (and HTTP independent, in the best of cases). Keeping straight what is AP and what is “currently widespread behavior” is key.
  3. This broadly-defined AP “protocol” (assuming self-annotated data that needs to be cleaned up a pinch to be graphed globally) is already described in the AP spec, which assumes neither LD tooling nor LD understanding in its reader/implementer, so as to decrease the developer effort to make a simple implementation that does not need fancy graph-traversal/big-data capabilities (i.e. if you’re just publishing and subscribing to and displaying AP objects, you might as well stick to JSON, it’s not like you’re building a search engine or an AI algo).
    • The current spec was the product of volunteer labor on difficult timelines, with plenty of corners and rough edges still poking out at time of “pens down.” There was little support for sanding those down until recently, and its thankless maintenance work. I am worried that the people working to improve and refine the spec need to be given a pass NOT to discuss this endlessly and just finish what they’re doing.
    • There is also plenty of tech debt for documentation at this level: There are bugs to be cleaned up in the @Context files (people keep finding discrepancies between domain values and the spec, for example); test suites are needed; Dan and Darius have been thinking pragmatically about test data and robustness testing, i.e., does each instance make reasonable guesses and defaults for legacy or slightly-malformed data?

Things weren’t perfect 4 years ago when a version was cut and people went back to their dayjobs. Over those 4 years, however, lots of people muddled through without an accessible test suite or thorough-enough documentation, and now we have both LD-based and LD-free implementations federating despite all those shortcomings. That’s a small miracle.

AP is already what Tess O’Conner from google calls a polyglot spec, for better or for worse, and going in a drastically different direction is not a minor upgrade, it’s rescinding the constitutive compromises of the CG and starting over. That’s why I am lobbying for having test suites and documentation ASAP and punting as far into the future as possible holy wars about the lowest layer of the specs as these can only harm the fragile interop we have today. If anything, we need to fortify that interop by better documenting how JSON-only and LD-native implementations can smoothly interact, where features break, etc.

Plenty of people are happy to blame the polyglot aspects of the spec for everything they don’t like, and these jihadists would be all too happy to tear out any concessions to JSON-only-friendliness OR to JSON-LD-parseability, depending which side of the compromise they sit on. Those voices naturally crowd to the fore at the mention of a possible normatively-empowered working group, seizing on the opportunity to drastically simplify the spec by cutting off the third of it that they think gets in the way of the other 2/3 without justifying itself. But down that path lies a lot of conflict, a lot of breakage, and difficult consensus. I don’t want to lose the JSON-only developers, and I don’t want to lose the Solid/RDF/JSON-LD folks either. I don’t see how we can “resolve” or “rethink” this compromise without losing one or the other. A río revuelto, pescador contento, as we say in Spanish: foregrounding this divides and conquers us into factions, and at least one faction will pack their bags if we turn this into a “debate”, or even if we misframe the terms of this topic when it comes up in adjacent technical discussions.

Personally, I will probably pack my bags and go do community work in another community if the question of tearing out 1/3 of the spec is opened before we have a test suite and a healthy extension adoption process. I want us to keep the momentum and the consensus we have today focused on regularizing and advancing status quo, without getting sidetracked by holy wars over developer frameworks.

3 Likes

I’m not comfortable with some of the phrasing (“holy wars”, “jihadists”, …). I feel like it is describing those who are questioning the status quo in a potentially unfair and misleading way. It may stifle productive debate in the name of unity.

That said, here are some of my current thoughts…

If an AP implementation is not using linked data or RDF, then I see very little value in using JSON-LD. My guess is that a very, very small percentage of the AP Fediverse is using linked data and/or RDF, and of those, an even smaller percentage can federate with the Mastodon-iverse (Vocata is one example, with a total user base size of ~1).

There are places in the AP specification where it’s clear the editors were focused on JSON-only and not JSON-LD. For example, in the C2S Update behavior, a null is used to remove properties. That’s not compatible with JSON-LD (which will remove the null-valued property from the Update message during expansion). Since almost nothing uses C2S (or a variety of reasons), this hasn’t been a big problem.

As far as I know, there were no JSON-LD/RDF AP applications used as evidence for the recommendation approval process (one was Mastodon). The applications were JSON-only, non-RDF, microblogging-oriented, and reflect basically what almost all implementations are doing today. I know of one Solid-based AP implementation and it doesn’t federate with the JSON-only Mastodon-iverse.

I personally am a huge fan of RDF and related technologies. I’d love it if AP were really a linked data specification rather than just hand-waving about it. However, even if we lost all the Solid/RDF/JSON-LD developers, it would be an extremely small number of developers compared to the JSON-only developers. If we can make like easier and more productive for the latter group, I think we’d be irresponsible not to consider it.

Yes, there’s been a massive amount of work to get us to where we’re at today. However, we don’t want to fall victim to the sunk cost fallacy as we discuss the best path going forward.

3 Likes

This is a great article, which I overlooked the 1st time you passed it on chat.

:100: for both test suites and docs. Whereby the docs can only go as far as explaining the fundamentals (“core spec”, must-haves) and best-practice guidance for implementing AP extensions.

Here I’m less sure. We already did that and it leads to increasing tech debt and/or extra effort to achieve interop (the double cost of a polyglot spec that the article mentions). Now, while good test suites and docs help make that more bearable, this is asking a JSON-only developer to define a @context they don’t care about.

I agree with this. I’m specifically pondering whether there’s a non-drastic way to go from “JSON-LD first, JSON-only second” to “JSON-only first, JSON-LD second”.

First of all having an alternative means to define an extension (e.g. with JSON-Schema and Compliance Profiles referencing docs) is an addition, not breaking backwards compat.

Second consider how a JSON-LD-first developer in current environment gets to integrate stuff from a JSON-only developer:

  • @context that is an afterthought, full of errors or design flaws.
  • Properties and types not defined that the spec says to ignore.

And the other way round:

  • “Please JSON-only developer master enough of Linked Data to understand what I pass along”

In other words the double cost. The article you linked describes a way to solve this at the end.

Luckily this is not the subject in this thread I aimed to address :smile:

Let’s assume availability of the test suites and docs and consider the 3-stage / concentric circles Standards Process with the W3C at its heart.

Standard Process stage Present day Fediverse Futures
3rd-stage, outer circle: Decentralized dev ecosystem Copy what others do. Ad-hoc extension in codebases, protocol decay and tech debt. Extensions mostly not usable by LD devs. Slight improvements with best-practices to follow and extensions tested against these. Extension mechism works for all. Extensions may still not be directly usable by LD devs.
2nd-stage: SocialHub / FEP process Extension spec is iterated on with more community attention and potential for broader consensus and future adoption. Extension spec quality is significantly improved in more authoritative (yet still grassroots) process. Compliant extension MUST be Linked Data compatible. LD devs are well served.
1st-stage, inner circle: W3C formalization W3C is rebooting activity in CG/WG charters. Provides informal guidelines (e.g. wiki’s), not yet sufficiently fleshed out to resolve the LD/non-LD conundrum. No guaranteed LD compliance. W3C provides robust guidance. Proven, widely used FEP’s from 2nd-stage and specs from well-organized 3rd-stage devhubs are further formalized in W3C artifacts. All work is LD compliant.

Yess, exactly. We are currently in a special stage of Fediverse AS/AP evolution, where for the first time in years we have sufficient community interest to bring improvements on all fronts. We should use that time to allow us to be open and brainstorm about solutions for challenges that have plagued us so long. And then decide the who/where/how/what to put the solution into place.

3 Likes

hehe, dont worry, questioning status quo and considering ways to make things easier for JSON-only are not what I meant by Jihad at all. “JSON-LD is a failure” or “all polyglots need to be removed from W3C” are the jihadist positions here. And they’re not far off once we start slipping down this slippery slope…

So in my post above the future situation would be like:

:point_right:  3rd-stage: LD advised, not guaranteed → 2nd-stage: LD-compliant → 1st-stage: LD-compliant

Which would tackle holy wars at the W3C :slight_smile:

Sounds good and thanks for the clarification. I don’t have any opinions about polyglots, in general, but it is a tough position to defend that AP JSON-LD has been successful (not saying you are doing that, but it’s a general comment). Success and failure are not necessarily either/or, but AP JSON-LD seems pretty close to the failure side of the spectrum to me (in terms of actual deployment and success in the wild, not in theory or concept).

Maybe a specification cleanup is all that’s needed to make it a success, but my gut and observations tell me the issues probably run deeper than that.

I also think that it may have been a mistake to believe “extensibility” came for free with JSON-LD and then mostly ignore the topic during the specification process. It depends on what one means by “extensibility”, but I think JSON-LD supports only a very weak form of it (primarily namespaced property names). That alone is not enough for effective interoperability, in my opinion.

If a SocialWG (AP-focused) were starting greenfield today, with the benefit of what we’ve learned over the last 5 years and knowledge of current federated server deployments, I wonder what different choices they might make.

1 Like

I think we should maybe hear from the original authors as to the degree to which extensibility was foreseen to come “for free” or “out of the box” for implementations using JSON-LD parsing of inputs (and thus to test their own implementations of someone else’s extensions against a properly-formed @Context file) instead of calling them wrong in absentia. As I understand it, the extensibility mechanism was something many of them had hoped to define in more detail, but simply ran out of time since the WG had a “feature freeze” and number of specs to finish in a short amount of time. We shouldn’t attribute to malice or unrealistic expectations what is more legible as a failure of [unpaid] project management of [unpaid] labor.

But I wholeheartedly agree that better documentation is needed for JSON-only implementers to document extensions and build adoption momentum without having to “learn LD”. I kind of assumed that people were just asking for help with the @Context part when they got that far in the FEP lifecycle, since a malformed @Context isn’t that much work to straight out with a little help. Anyone reading this who is nervous about that part is welcome to tag me in the FEP PR when they get that far :smiley:

1 Like

I agree with your post, but want to emphasize that I am NOT calling them wrong or suggesting malice / unrealistic expectations. Just unclear to me and others.

(PS. As moderator I can say that anyone that does put prior work in such negative light will be acting against the collaborative nature of this community and even the code of conduct.)

1 Like

(for some reason the quotes were stripped from my message, but this is responding to @aschrijver and @bumblefudge comments above…)

I completely agree with this. I think it’s fair to voice criticisms about potential mistakes and participate in discussions about how to address those. I don’t believe that implies malice on either side. If I believe someone acted maliciously, I’ll typically say that explicitly.

It’s also not accurate that (all of) the authors/editors and WG members are in absentia. I know @eprodrom and others who were involved in the specification process in some manner have accounts here and are free to comment and correct misperceptions. Based on SocialCG transcripts and recent comments from Evan on Mastodon, he provided a draft of an extension process (a context extension registry and associated approval process?) and then effectively withdrew it because he couldn’t get a vote on it (which, to me, suggests lack of interest or priority from the WG). As far as I tell from transcripts and related discussions, that extension process was focused on the JSON-LD context extensions, which I think is insufficient (but maybe necessary as part of a broader extensibility/specialization strategy).


An interesting quote from the transcripts that is somewhat related to the general discussion of domain-specific profiles:

cwebber2: If AS2 adoption grows so well that the world converges around a new set of terms that are so good that everybody just wants them in an official recommended way, couldn’t we make a new vocabulary that’s called ActivityStreams-Foo, where Foo is whatever super cool hot new thing in the future that we have no capacity to envision . . . For most extensions, there’s nothing blocking us from doing AS-whatever – SocialCG minutes (2015-12-01)

Yes I like this. Here are some notes:

I think it has to be so that the default context is optional, extensible, interoperable, which would be a small change to the spec.

A new, agreed upon, @context file could be part of a living standard too, and integrated with the test-suite

1 Like

Hey, Steve. So, there’s a new extension policy under consideration right now:

https://w3c.github.io/activitystreams/draft-extensions-policy.html

In terms of extensions, the AS2 doc says that the SocialCG will maintain a registry of extensions.

I think having a mechanism for reviewing extensions, and a mechanism for including them in the AS2 context if they’re sufficiently popular, is a pretty good extension policy.

Yes, I am aware of it, and thanks for creating it.

I think that one of the challenges is that the word extension is sometimes used in an imprecise way. If I understand correctly, AS2 and your proposal are discussing extensions in the very specific sense of extending the JSON-LD context vocabulary terms. That’s useful, but it’s only a fraction of what’s needed for effective interoperability for federated applications.

The JSON-LD terms define no problem domain semantics and provide no description of expected behaviors and side-effects, for example.

Since the earlier discussions in this thread, I’ve changed my perspective a bit. Especially given the current specifications and the historical context, I think we should continue with JSON-LD as a data model. However, I think the developer community would benefit from better information about how to use the JSON-LD data model without JSON-LD processing. For more effective interop, I think we need a way to describe domain-specific “profiles” that use AP/AS2. Some people call these “extensions”, but I don’t like that label because they typically restrict AP/AS2 more than extend it (although they may do that too). For example, they may only support a few Activity and Object types and probably only support a subset of AS2 properties on those types (possibly making some of them required or making a property functional). However, at the same time, they might extend the vocabulary with terms specific to their problem domain. I think of this as AP specialization rather than extension. An AP profile would be a set of documents (e.g., a natural language description of behavior and side-effects, JSON-LD context, JSON Schema, test case descriptions, …) that describe the specialization.

Developing these profiles is a community activity that doesn’t require major changes to the AP/AS2 specifications.

I think the FEP process is a great way to give better definitions for extensions.

I think writing up profiles for different application types is a good idea.

FYI: Mentioned this thread in reaction to @hrefna researching different ways to model extensions as Linked Data and posting about it.

PS. On the thread is another reaction by @gugurumbe who in frustration of JSON-LD started writing a different way to parse:

ActivityStreams is cool, but json-ld is horribly complex. We believe
that it’s OK to define the NEO-ActivityStreams format by only
accepting JSON-LD that:

  • have exactly one top-level object;
  • have exactly one context object;
  • the context object is directly in the top-level object, at the top
    of the file;
  • the context is an array, a single string, or a single map;
  • if an array, the first elements of the array is a list of guaranteed
    non-overlapping context URIs, that do not change the meaning of the
    definitions of other contexts, and that dereference to constant
    definitions;
  • the last element of the array is optionally a map from prefixes to
    namespaces (strings ending in ‘#’ or ‘/’);
  • there are no two overlapping namespaced extensions.

Isn’t that algorithm followed by the comment…

However, this is not very useful. I’m now trying to get the json-ld algorithms to work,

:person_shrugging:

Yes, but it shows what devs are faced with. Frustration, desire for a more comprehensive way, and then realization they can’t do it that way since it wouldn’t be interoperable. The DX of “doing it right” is both horrible and guesswork rn.

1 Like

I think this is a very, very common experience among developers coming in and trying to create something that will interoperate, even at a very basic level, with pretty much anything that already exists in this space.

Every new layer of trying to build an interoperable system is basically a new set of headaches and well documented problems (as in “the documentation can be found at the bottom of a well”) that don’t quite go all of the way to deal with.

Everything requires so much bespoke, custom work (I mean just look at Pleroma’s Transmogrifier) that one wonders at some point if what even are “ActivityStreams2” and “ActivityPub” giving us?

3 Likes