Linked Data: Undersold, Overpromised?

melvincarvalho · February 2, 2022, 11:20am

I dont think there actually any real linked data ‘experts’. And anyone that calls themselves one, probably isnt

The solid discourse area was set up by a commercial entity. From my experience, they are reluctant to help out or support people using solid. The vibe in this forum better IMHO

Fundamentally consider a programming language where every variable you use MUST be a URL, and SHOULD link to another quite complicated page of meta data for that URL. And where the only data structure you are allowed to use is a Set. Arrays are an after thought, shoe-horned in, that no one understand. This programming language does not allow things like addition without specialist servers with atomic updates, which still have not yet been built

That’s the state of linked data

It’s certainly useful in some situations. But to say it’s useful in all situations is wrong. That’s the mistake that Linked Data “experts” make. They make promises they dont understand, and for the most part, dont even use. Then when stuff breaks, there’s often no one to help.

LD should be used to solve a narrow set of problems. Such as merging data from different websites. Or in scenarios where links are under used and high value

At the time of making ActivityPub the idea was that if everyone used this ‘standard’ it would be possible to create a rich network effect through interop. Even with the limitation. Well, understandably developers struggled with the limitations, and some rejected them. Rather than to keep pushing linked data, where the earth has been salted, a better way is to accept its usefulness in some situations, explain it, and also accept the limitations

Linked data should be viewed as a variable scope. One higher than global variables. Then programmers have a range of tools to achieve their goals.

Edit: A possible solution.

Recognize JSON-LD as a form of linked data, which has a syntax for representing hyperlinks, for representing things, types, and some common properties
Recognize JSON as a super set of JSON-LD, with all the features, plus more on top, such as Lists or Arrays of typed things
Match slow changing vocabs, to slow changing software, and allow new types of innovation and interop through JSON

naturzukunft · February 2, 2022, 6:55pm

https://linkedopenactors.org/#introduction-to-the-concepts

What do you mean with “the forum” ?

aschrijver · February 2, 2022, 9:11pm

@naturzukunft I already have LOD on the list of candidates to add. I am behind on README maintenance, but keep adding entries to the issue. With the “the forum” I refer to Solid community forum where hardly anyone from the core team or Inrupt seems to really want to interact.

@melvincarvalho thank you for that elaboration. Some good food for thought for me there.

naturzukunft · February 5, 2022, 11:40am

I did not want to draw attention to LOA, but to the links I collected in the linked chapter.

aschrijver · June 20, 2022, 3:05pm

Just bumped into a listing of various ways to serialize RDF Linked Data:

Copying the summary:

TL;DR

Use Hex-Tuples if you want high performance in JS with dynamic data.

Use JSON-AD if you don’t have to support existing RDF data, but do value JSON compatibility and type safety.

Use HDT if you have big, static datasets and want the best performance and compression.

Use N-Triples / N-Quads if you want decent performance and high compatibility.

Use JSON-LD if you want to improve your existing JSON API, and don’t need performant RDF parsing.

Use Turtle if you want to manually read & edit your RDF.

Use Notation3 if you need RDF rules.

Use RDFa to extend your existing HTML pages.

Use RDF/XML if you need to use XML.

If you can, support all of them and use content negotiation.

Sebastian · June 20, 2022, 8:27pm

This is a very nice piece.
But I think in a federated world we can leave out a bit:

Hex-Tuples (draft) is the format by the writer and ‘high performance’ means billions and data is static in our case, also nobody uses NDJSON yet.
JSON-AD solves what AP already solved, Atomic Data (see also ‘Advocacy’ later)
HDT – probably billions and billions like in a twitter world
RDF/XML - cause I can’t think of plain XML use cases

So,

JSON-LD is the default anyway

and then

“Turtle if you want to manually read & edit your RDF.” This includes e.g. manually reading and editing of the vocabulary used in the fediverse but ‘before JSON-LD’
“N-Triples / N-Quads if you want decent performance and high compatibility”
“RDFa to extend your existing HTML pages” (e.g. w. AP objects, schema or mf2) but ‘after JSON-LD’.

What is left out here is Advocacy.
This is why we can also use ActivityStreams itself for Tuples (double and triple) in the form of Profile and Relationship …

Usually 1 software developer defines the @context but neither ActivityPub Instances/Groups nor Users.
To keep the @context small and let everyone “extend ad-hoc”, we can use Profile and Relationship as attachment.
The benefit is that each edge can be a reusable public ActivityPub Object owned by anyone.
Profile could say Alyssa:Portrait describes Bob:Bob – or
Relationship could Alyssa:Alyssa wdt:director Universal:NextBigThing or whatever.

melvincarvalho · June 22, 2022, 3:28pm

Hextuples IMHO is overkill and lots of technical debt which are a source of bugs

Main thing that’s needed for a social web is JSON with a standardized way of expressing hyperlinks. In JSON-LD that’s using @id or id as a key

Big issue with RDF is that it’s not compatible with plain old JSON, as that has yet to be standardized. There’s not really a will to do it, so we are stuck with hextuples.

However if the AP community got together we could do that for the social web. What would be required would be a way to take plain old JSON keys and put them in a triple store (which requires URs).

Something like:

key <–> URI

foo <–> json:key:foo

thisismissem · December 17, 2022, 11:30pm

Apologies for the extremely delayed reply, I missed this back in January, as I was out of action that month.

Anyway, hey, hi! I’m Emelia, I currently work on the developer tools team over at Inrupt (the company founded by timbl to build Solid for the enterprise).

This thread strikes a cord with me, as I’ve been doing my best to learn RDF & Linked Data over the past year. I don’t come from an RDF, Linked Data or Academic background; instead my background is in working with startups (~23 to date) to build out their products, and in building tools & platforms for developers to be as productive as possible. I’ve also significant experience with building both frontend and backend applications, and a wealth of production knowledge on GraphQL and Node.js.

I could not agree more that the biggest hurdle developers will face in adopting Linked Data is it’s background in RDF.

Mainstream developers typically choose the option with the least friction possible, they want the API to work and work how they want it to, without having to learn for years to feel proficient in what they’re doing. In reality, code and standards are like a barrier between them and their paycheck & getting home at a reasonable time, and keeping their product manager off their back as to why something isn’t done yet.

And that’s coming from someone who’s usually working with more modern tech stacks (react, redux, graphql, json APIs, webpack, nest.js, typescript, etc). I’ve spent a bunch of time trying to help developers towards actually using schema-driven APIs, and it’s always a challenge, you’ll often have different teams wanting to do things “their way” rather than working in unison & trusting that the other team knows what they’re doing.

In previous conversations, I’ve kinda jokingly said “RDF is highly structured, schemaless, garbage that you may find useful data in”, by which I mean, you’ve no guarantees with what you get in RDF, you might get a single value, multiple, or none at all, you might get a certain “field” (predicate), but you may also not.

This is very different to how most developers are used to working where by they can make assumptions that their API returns back data in the way they want, or if not, they hack through whatever “crap” the backend developers have sent them to make their applications “work” and satisfy the product requirements.

This is where a lot of developers find frustration in working with TypeScript and data coming from APIs, because JSON.parse returns an any rather than the structure that they think they know they are receiving from the API. It’s quite rare to see developers actually validate and check the shape of the data they’ve gotten back from their API — the most you usually see is a response.status === 200 check, and then just response.json() with maybe a try/catch around the asynchronous network call.

Even just getting developers to write consistent GraphQL schemas is challenging, and often requires a lot of training & guiding towards helping them understand what they’re working with, and how it can help them, instead of fighting the tool or library.

There are newer approaches that are starting to improve the situation, such as Zod, GraphQL-Codegen, JSON-RPC or tRPC, and some JSON Schema tooling, but adoption is only just barely happening, there’s a lot of practices to unlearn.

I also know in the Linked Data world we’ve ShEx and Shacl, both of which try to enforce at least some sort of schema on top of RDF data. I definitely think the “killer” platform for getting developers to adopt RDF, Linked Data, etc, is going to be one that takes them from their practices today, and bridges the gaps & hides the complexities. For instance, this might be using GraphQL schemas & operations (queries & mutations) in their Schema Definition Language, to generate essentially an API for their data that they want — Jesse Wright, a coworker at Inrupt, actually has a prototype . Another option might be a tool to transform JSON Schema’s into Shacl/ShEx, or do similar code generation as previously described.

But I don’t think majority of developers will seriously work with lower level interactions with Linked Data (e.g., interacting directly with JSON-LD, Turtle, Quads, etc). Those will be more used by tooling developers or experts who know what they’re doing (or beginners who don’t yet know what they’re doing at all, and make many mistakes due to not knowing better).

I hope this can hopefully add to the conversation here, and perhaps a different perspective. (And hopefully no one is highly offended by my joking description of RDF, I’ve just seen far too many people struggling to get their heads around it’s quirks, even people who would normally be writing database migration & actually schema driven data)

(these are my own opinions and do not necessarily reflect the standing of my employer)

aschrijver · December 18, 2022, 6:57am

Thanks for posting this @thisismissem. It is a very tricky situation to get adoption going, and I agree that good tools support is a major factor. I once used Solid as an example technology that is going opposite direction to where the Fediverse is moving:

Where on one hand, based on ActivityPub standards in a grassroots movement everyone is creating app-specific extensions on the fly which results in protocol decay and increasing complexity. And on other hand Solid is moving in the direction of a semantic web by creating ever more intricate specifications in a formal standards body, also leading to a steady increase in complexity.

On the side of the Fediverse the big challenge is how to motivate individual developers to contribute time and effort beyond their own project to improve the ‘technology substrate’ they rely upon. Doing so would be a win-win for the project, but FOSS/fediverse culture and dynamics stand in the way of getting good progression here. Luckily recently the #standards:fep process is getting more interest now, with fedi going mainstream.

On the side of Solid I personally feel that fostering community involvement alongside the core teams developing specs is major important. And also seeking collaborations with other tech communities. So that, together with the tools that already exist, a large body of (mainstream or not) developers start to experiment for the “joy of coding” and create more tools in the process. My criticism in the past has been that this isn’t happening enough and the Solid community feels like a silent backwater because community-building isn’t high on the agenda or getting the attention it should. There is a focus in Solid to find technology adoption via a commercial path, which may be viable idk, but in my experience broad technology adoption has most chance if it appeals to developers, who then cause bottom-up uptake into eventually commercial applications.

It is good to see that you are active in the Solid community forum. It is important, I feel.

As for Linked Data tool support… the more the better. LD et al really suffers here for years and years. A great tool for language XYZ, but not the language one wants to program in. With so many standards a vast amount of tools and quality libraries is needed. And you always bump into a project that by lacking that one feature that was added in Spec vNext isn’t directly applicable, and that project no longer maintained means fall back to DIY on spec implementers-level.

thisismissem · December 18, 2022, 5:07pm

Yeah, I think the other thing we’re seeing is the adoption bell curve, meeting your above chart, meeting learning curve. From what I’ve seen, the corporate developers are more or less familiar with things like webpack, react, next.js, etc; the community developers, on the other hand, tend to not just be unfamiliar and uncomfortable with that tooling, but openly reject it, preferring to write “pure” javascript (like it’s the early to mid 2000s).

As you note, I’m very active on the Solid forums, and between myself and Nic A.S (tech lead for Developer Tools), we try to do a fair bit of community engagement, but we’re both super busy on getting the SDKs to work well, be fully tested, aligned with standards and implementing new product features. In my opinion, the company who manages to get Developer Advocacy and Engagement into a perfected art like that of say, the Twilio developer advocates (well, pre-layoffs), will likely succeed most.

I’ve been really wanting to dogfood our own APIs and build an app, but I’m very time constrained. The stack I’m particularly excited to try is one that’s react, react-router with dataloaders, and some sort of shacl/shex to typescript tooling (I’ve seen a few projects in this space, but those languages still don’t make a heap of sense to me — I kinda want Prisma but for linked data).

From what I’ve seen, those with the time/energy to have to do “coding for joy” tend to be on the more novice side of tooling & how they build apps, with a skew especially towards those currently in university. (I might be wrong here, but that’s kinda what my “finger in the air” type sense is giving me).

I think when it comes to the fediverse, most are just building against mastodon or forks thereof, and therefore inheriting that codebase’s understanding of the activitypub & related specifications. You can definitely do a “simple” reading of activitypub & discard all the json-ld learning curve, which I think a significant chunk of developers will do.

There is definitely a gap in available education that bridges from “what developers want to do to ship products” and “knowledge academics have of RDF and standards”, there’s also some things in the specs where they’re written from HTTP semantics, but not necessarily Web semantics (hello DPoP + redirects + fetch API! There’s no way to manually follow redirects on requests in the browser).

Your average developer will probably skim the specs at best if they can’t find the answer in docs or stack overflow / social media, at worst, they get frustrated & give up.

The thing with RDF & Semantic Web tech is that they’re deeply complex, and simplifying them often leads to major shortcomings and issues, that and now, after, what, 2 decades of use in academia, we’re only just starting to see use outside of academia. That’s 2 decades of knowledge to be shared, but also, developers want to make their applications maintainable & fast, they don’t want to accidentally execute an unbounded query that takes 5 minutes to return, at best. They want their data and they wamt it now. They also don’t want to be yelled at by someone who’s telling them they’re wrong just because they took the quick or naive approach.

A nice example of this that comes to mind is this: in the Inrupt SDKs, we currently have an undocumented JSON-LD parser, it’s been a source of maintenance issues for ages. It’s only used by two downstream SDKs to get fairly standardised files (.well-known or parsing out a few fields that are required in a Verifiable Credential), these are technically JSON-LD, but no one is really going to care if they’re parsed as JSON or JSON-LD, because they’re only wanting a few standard keys from whatever data might exist. I’m arguing we should ditch the weight of JSON-LD here and just parse as JSON, whilst I know it’s technically wrong, it’s also technically expedient, and saves us quite the maintenance troubles — allowing us to later tackle making JSON-LD work well (it also allows us to shave like 20% of the weight off our SDKs). But I’ve definitely had people who know RDF and Semantic Web better than me tell me I’m wrong in that call, but not listen to why I might actually be making the right call for right now.

Another problem I see with most Linked Data tooling is that it expects developers to have a working Java install on their machine, and to understand how to install Java tools, I guess the same could be said for JavaScript tools, but these seem to be more familiar maybe to developers regardless of their language background. I’d personally love to see more tools written in Rust & Go, as to be able to ship a single binary that’s super easy to use & install, without the need for a language environment.

As for the push from Inrupt and others to tackle the commercial enterprise market, it really just comes down to capitalism and money: who has the biggest pockets to fund the development. I don’t think this is the wrong approach, but do agree we need more on the side of non-enterprise involvement; this is also where I see Solid’s major shortcoming: what incentive do I have to keep my data schema standardised and to not make arbitrary breaking changes to the data my application produces? From what I can tell, there’s little commercial value in that: as a company, you want to consume all the data, but you don’t want to be responsible for the data or be constrained by it (data is nuclear waste after all these days (I forget who said that)), and in fact, having your competition be able to consume your data might be a competitive disadvantage. Almost by definition, companies are looking to make a profit, and they’re incentivised to do everything they can to maximise that value; very few companies are altruistic — even with my own company (not Inrupt, something else), I registered it as a for-profit company, despite positioning it as a social good technology incubator.

So I definitely think there’s a lot of hurdles to overcome.

Fediverse wishlist though:

WebIDs for profiles
Sign-in with a WebID / OIDC provider
Standardised tools for content moderation (especially an API protocol to control moderation actions taken) (background: I rewrote the moderation tools in Mastodon back in 2017 after being a volunteer moderator on switter.at and realising how bad they were for communities at scale. This is also why I think the fediverse/global timeline is harmful)

aschrijver · December 19, 2022, 7:20am

Thank you for your elaborate thoughts on the subject. There’s much to ponder going forward. The IT landscape has become very complex, and the decisions on “what horse to bet on” are tough.

You mention WebIDs, OIDC, VCs and esp. on the latter there’s a lot of development. Wrt fediverse and moderation there’s a ton of people thinking about improvements. I am very interested in the work of Christine Webber (@cwebber) and Randy Farmer on Spritely institute. Christine does not see a future for current moderation model and predicts an implosion of the current fedi where it can’t cope with the demands placed on it.

They are working on an entirely new paradigm to decentralized computing, the Object Capabilities Network or OCapN. In the past Christine reached out to Solid to discuss alignment, but it was not followed up on from Solid side unfortunately. I find Ocaps to be major interesting, though a lot of work is still in very early stages. I could provide a bunch of links, but I recently did that on Elixir forum, so will post that instead:

SeriousFun01 · December 19, 2022, 4:12pm

When I first came across RDF/OWL and the semantic web circa 2015 my mind blew up. I thought: why isn’t everybody adopting this? Then I gradually worked my way through the forbidding academic verbiage (which at least partly seems to aim to obscure the lack of true relevance), the lack of modern tooling, the lack of concrete useful applications, the lack of an active community etc. and seven years later my enthusiasm was substantially dented.

Yet I still don’t see how we can progress without some version of “semantics”. The JSON/JSON-LD/GraphQL story is very instructive: JSON is simply inadequate in itself. The need to use something like json-schema proves this - as it is essentially a reinvention of parts of the semantic wheel. To use a programming language analogy, JSON is the assembly language of data exchange and we need something higher level, like C++ and/or friendlier, like Python.

One does not have to spend alot of effort to motivate a higher level language so why is there such difficulty to embrace higher level data constructs / metadata? The answer in my view lies in the gap between the overhyped semantic universe and the lowly SQL and relational databases that everybody is using. Imagine that you could work with an OWL type structure inside e.g., something like protege to logically describe a domain and automatically obtain database schemas, migrations, API’s etc. with optional linkages to existing authoritative definitions. Why wouldn’t you want to eliminate the drudgery / boiler plate and focus on the good stuff?

aschrijver · February 14, 2023, 11:08pm

I followed up to this discussion in: Linked Data versus ActivityPub