Linked Data versus ActivityPub

aschrijver · February 14, 2023, 11:07pm

(This is a follow-up to Linked Data: Undersold, Overpromised? )

In the context of FEP-1570: The FEP Ontology Process I bumped into this paper by Samuel M. Smith who expressed his worries on the role of Linked Data in the W3C Verifiable Credentials specification (and also btw extending to the W3C DID recommendation).

I won’t give a summary, other than quoting Sam’s TL;DR, as I could never describe things the way he does:

“The VC standard appears to be an adoption vector for Linked Data, not the other way around.”

I feel the same might be said for ActivityPub. This paper is very critical on the use of Linked Data (RDF/JSON-LD) other than its original use case of the Semantic Web and Google’s current proprietary use for SEO (called an edge case in the paper).

I have never had a strong opinion for or against Linked Data in general, but I always thought “When are we going to see the powerful applications that this technology should enable?”, which I’ve never seen other than in academic and complex prototypes.

And I’ve always worried about the adoption of Linked Data being an issue. Something I recently mentioned on the Solid Project matrix chat and before multiple times on their community forum.

Here’s the paper and I am curious what people think of the arguments it makes…

VC Spec Enhancement Strategy Proposal

Strategic Technology Choices vis-a-vis the Linked Data (JSON-LD/RDF) End State.

2022/04/04 Version 1.2.8

Barriers to Adoption of Linked Data VCs

The purpose of this paper is to capture and convey to a broader audience my increasingly worrisome concerns about the adoption path for Verifiable Credentials (VCs). My concerns began with the security limitations of VCs that use Linked Data (otherwise known as JSON-LD/RDF) and have since extended to the semantic inference limitations of Linked Data. My concerns may be expressed succinctly as, the VC standard appears to be an adoption vector for Linked Data, not the other way around. My overriding interest is that the concept of a VC as a securely attributable statement is a very powerful and attractive one and therefore should be widely adopted. We should therefore be picking the best technologies that best support broad VC adoption, not the other way around.

I know that during the standardization of W3C ActivityPub by @cwebber et al the Linked Data aspects were a hot and controversial topic. Apparently that is also true for W3C VC and DID’s. While I don’t want to trigger heated discussions, maybe for an ActivityPub vNext we might reopen the discussion whether or not it should be a Linked Data specification

jfinkhaeuser · February 15, 2023, 6:58am

LinkedData feels very much like SOAP. It’s got everything you need in it, and as a result, it’s no longer the most useful.

It’s very much important to understand that the idea behind LinkedData, RDF – and also SOAP – is/was to write code once that can process anything. To do so, you need to describe your data such that completely generic code understands its meaning well enough to know what to do.

And that’s fine in principle, but it means there will always be simpler ways to express the same data, and simpler, single-purpose code to process it. You “just” need to write a lot more single-purpose code to make sense of all of those data models.

With ActivityPub, actions have a common frame, but individual action types have different fields. Only software that knows how to process and render those fields will support a specific action type. In particular, good rendering means hard-coding an understanding of those types.

Which basically makes LinkedData’s main features unused, that it’s possible to do stuff in a generic way with it. So I don’t see too much point in it.

That said, it’s nice to have a common basic structure for data, with @id and whatnot, so I have nothing much against it, either.

helge · February 15, 2023, 7:09am

I think my attitude towards JSON-LD can be boiled down with a simple:

What’s the alternative?

Here’s my list of requirements for a JSON-LD replacement:

Should be as compatible as possible to plain JSON.
Should allow one to assign meaning to properties (beyond its name)
Should be extendable
Should support some form of normalization required to do cryptography

I think that 1-3 are requirements to do FediVerse and covered by JSON-LD. 4 is necessary to do FediVerse+. There are currently no good solutions for 4.

PS: When I tried to expand on the above, I always end up writing about irrelevant tangents. So this will have to do.

aschrijver · February 15, 2023, 7:29am

One thing in the paper that struck a chord with me, was this part in The False Allure of an RDF Graph “Soup”:

[…] RDF’s idealistic simplicity of expression actually prevents dependency management layering in any meaningful way. Its ideal ends up maximizing the ratio of apparent complexity to real complexity, instead of minimizing it. So it is the wrong kind of simplicity; it’s too simple. As Einstein is reputed to have said: “Everything should be made as simple as possible, but no simpler.” Better yet is Oliver Wendell Holmes: “I would not give a fig for the simplicity this side of complexity, but I would give my life for the simplicity on the other side of complexity.”

Linked Data (RDF/JSON-LD) is simplicity on the wrong side of complexity.

A couple of alternatives were spun off from the original Linked Data discussion and tagged with “protocol”. See: Topics tagged protocol

Yea it is all quite entangled subject matter with so many ins and outs.

helge · February 15, 2023, 8:40am

I actually forgot the key requirement for anything to replace JSON-LD:

Has better python libraries
Has better JavaScript libraries
Has better insert your favorite programming language here libraries

I think that this is an incredibly hard bar to clear. The last sentence was not a statement about the quality of JSON-LD python libraries, more about how hard it is to build an ecosystem.

aschrijver · February 15, 2023, 9:01am

In the paper there is mention of JSON Schema’s given their broad adoption via OpenAPI. This last spec has a cousin in the asynchronous messaging space that may be interesting:

melvincarvalho · February 15, 2023, 6:14pm

What is the alternative?

JSON with hyperlinks. We would have saved 10 years if we’d standardized around that, instead of importing RDF/XML and 20 years of tech debt

I took a stab at showing a simpler way:

Aim is to get 90% of the utility of JSON-LD with 10% the complexity. With full compatibility and upgrade path.

melvincarvalho · February 15, 2023, 6:17pm

Sure there are, see how nostr simply canonicalizes and signs JSON

github.com

nostr-protocol/nips/blob/master/01.md

NIP-01
======

Basic protocol flow description
-------------------------------

`draft` `mandatory`

This NIP defines the basic protocol that should be implemented by everybody. New NIPs may add new optional (or mandatory) fields and messages and features to the structures and flows described here.

## Events and signatures

Each user has a keypair. Signatures, public key, and encodings are done according to the [Schnorr signatures standard for the curve `secp256k1`](https://bips.xyz/340).

The only object type that exists is the `event`, which has the following format on the wire:

```jsonc
{
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "pubkey": <32-bytes lowercase hex-encoded public key of the event creator>,

This file has been truncated. show original

Developers love it because you can build a client, server or app in an evening

SeriousFun01 · February 16, 2023, 11:38am

its a long piece touching so many topics, so just some meta-comments:

the failure of the semantic web to deliver some tangible benefits is fairly obvious but we should not discount that for a very long period it lived in a hostile environment that in both technological and economic terms supressed the development of next gen “web 3” applications. It is poignant and telling that the only tangible use is json-ld for Google SEO. That was the only game in town. The “API” pattern too, reflects largely the mobile app + cloud design that is the epitomy of walled gardens, totally controlled at both the client and server endpoints.
of course you can reinvent the wheel with json schema and what not and property graphs might be more usable / adopted than rdf triplestores but its a fragmented, ad-hoc landscape and will likely remain so. if people think that fediverse adoption can establish a de-facto new standard thats fine but while loathed, all these old standardization efforts were trying to address issues that were are as real then as they are now. again, the nature of the information technology landscape in the past decade was atypical and essentially regressive: it did not solve any of these problems, it just didn’t have them given its predominantly siloed. the wisest course (at least as a long term project) would seem to be to retrace these original motivations and see how they can be re-expressed with the knowledge of today

aschrijver · February 16, 2023, 11:51am

I agree with you, and you depict the struggle that exists. I too share the dream of pervasive linked data, from its early inception. But how long should we root for Betamax become the thing, while VHS is already broadly in use? The main point in the paper is the worry that Linked Data will never gain broad adoption, that the chicken/egg of devs expanding the ecosystem and making more devs love LD for it won’t be solved.

SeriousFun01 · February 16, 2023, 12:36pm

yes, my feeling is that currently there is a window of opportunity for next gen web ecosystems and this should not be missed or hindered by any academic/turf/ego/culture wars. This window won’t last for long. The real race is not which big tech wins the “AI wars”, but whether information exchange, knowledge retrieval, processing using algorithms etc can be decentralized, democratic and shared by all or be controlled by a tiny few to general detriment. For reasons that have to do with internal contradictions of the prevailing socioeconomic model, there might be a chance to develop decentralized “killer apps” (I don’t like the word but it captures the point) that are attractive in a positive sense (Look, ma, what I can do with my self-hosted rpi) rather than as a reaction (twitter but decentralized). If VHS can deliver on that front, lets use it. I think it doesn’t (neither does Betamax for that matter, despite the pompous “semantic” jargon). But at least Betamax worried about valid challenges that VHS ignored. Minimally this gives some shape to how “VHS++” might have to look like. In the end form will follow function.

bengo · February 21, 2023, 2:08am

One can also do something similar using ActivityStreams2 and JCS Data Integrity 1.0

bengo · February 21, 2023, 2:10am

if anyone says you should be rooting for one or another, they’re wrong. you don’t need to root for either.

What is ‘VHS’ in this metaphor? JSON? activitypub uses as2 and as2 is already just JSON.

bengo · February 21, 2023, 2:58am

Well someone has to summarize some of it for us to make sense of your question, otherwise it appears you are making an Argument from authority - Wikipedia

To be fair, the only application where RDF (Linked Data) may be well suited for inference is the defunct semantic web, which is constrained by the semantic limitations of graph edges that are URI/IRIs. For every other application of semantic inference, labeled property graphs (LPGs) are more convenient and more powerful (social networks)

I agree that labeled property graphs are useful. So does the ‘linked data’ community, which is why RDF-star | Working Groups | Discover W3C groups | W3C is working rigorously on that, though that work is not mentioned is Sam’s document (I think perhaps it started after the original version was published).

Sam also mentions the importance of automated semantic inference, and makes some claims about how RDF makes this hard. This is true, though I’d argue that automated interoperable semantic inference is hard because epistimology is hard. I think some of Sam’s points about ’ polysemy’ are similar to the mention of ‘intersubjectivity’ problem here, and I think that paper’s descriptions of ‘situation semantics’ and especially ‘distributional semantics’ may provide good tools that for dealing with polysemy (as we are now seeing LLMs like ChatGPT exploit)

on the whole I think a lot of Sam’s point in that article are valid, but don’t require deprioritizing interoperable semantics to enable automated inference. I think a lot of his points can be addressed by using RDF-star.

Until then, if you don’t really care about automated inference or interoperable semantics, (as mot mastodon users dont), there is always ActivityPub, JSON, and JCS+signatures

wrt alternatives to JSON-LD, I will say I do like the approach of this.

aschrijver · February 21, 2023, 9:27am

I don’t have a particular question, other than that I have been a long-time fan of the vision/ideas relating to linked data, and pondering the future of this technology and alternatives. Consider this my free-ranging musings, nothing more.

silverpill · February 21, 2023, 2:08pm

There’s a FEP for this: fep/feps/fep-8b32.md at main - fediverse/fep - Codeberg.org

bengo · February 21, 2023, 5:30pm

something like this also seems very feasible way of modeling LPGs using RDF even without RDF star. https://arxiv.org/ftp/arxiv/papers/2301/2301.01227.pdf

jfinkhaeuser · July 16, 2023, 9:15am

Speaking as someone with a deep interest in AP, but no intention of implementing AP as-is, I would just like to point out that there are ways to express linked data that do not involve JSON.

I think the requirements are the wrong way around, and first should focus on 2-4, and then add the requirement that it should be possible to (de-)serialize in JSON.

Just my 2c when reading this.