Linked Data: Undersold, Overpromised?

thisismissem · December 18, 2022, 5:07pm

Yeah, I think the other thing we’re seeing is the adoption bell curve, meeting your above chart, meeting learning curve. From what I’ve seen, the corporate developers are more or less familiar with things like webpack, react, next.js, etc; the community developers, on the other hand, tend to not just be unfamiliar and uncomfortable with that tooling, but openly reject it, preferring to write “pure” javascript (like it’s the early to mid 2000s).

As you note, I’m very active on the Solid forums, and between myself and Nic A.S (tech lead for Developer Tools), we try to do a fair bit of community engagement, but we’re both super busy on getting the SDKs to work well, be fully tested, aligned with standards and implementing new product features. In my opinion, the company who manages to get Developer Advocacy and Engagement into a perfected art like that of say, the Twilio developer advocates (well, pre-layoffs), will likely succeed most.

I’ve been really wanting to dogfood our own APIs and build an app, but I’m very time constrained. The stack I’m particularly excited to try is one that’s react, react-router with dataloaders, and some sort of shacl/shex to typescript tooling (I’ve seen a few projects in this space, but those languages still don’t make a heap of sense to me — I kinda want Prisma but for linked data).

From what I’ve seen, those with the time/energy to have to do “coding for joy” tend to be on the more novice side of tooling & how they build apps, with a skew especially towards those currently in university. (I might be wrong here, but that’s kinda what my “finger in the air” type sense is giving me).

I think when it comes to the fediverse, most are just building against mastodon or forks thereof, and therefore inheriting that codebase’s understanding of the activitypub & related specifications. You can definitely do a “simple” reading of activitypub & discard all the json-ld learning curve, which I think a significant chunk of developers will do.

There is definitely a gap in available education that bridges from “what developers want to do to ship products” and “knowledge academics have of RDF and standards”, there’s also some things in the specs where they’re written from HTTP semantics, but not necessarily Web semantics (hello DPoP + redirects + fetch API! There’s no way to manually follow redirects on requests in the browser).

Your average developer will probably skim the specs at best if they can’t find the answer in docs or stack overflow / social media, at worst, they get frustrated & give up.

The thing with RDF & Semantic Web tech is that they’re deeply complex, and simplifying them often leads to major shortcomings and issues, that and now, after, what, 2 decades of use in academia, we’re only just starting to see use outside of academia. That’s 2 decades of knowledge to be shared, but also, developers want to make their applications maintainable & fast, they don’t want to accidentally execute an unbounded query that takes 5 minutes to return, at best. They want their data and they wamt it now. They also don’t want to be yelled at by someone who’s telling them they’re wrong just because they took the quick or naive approach.

A nice example of this that comes to mind is this: in the Inrupt SDKs, we currently have an undocumented JSON-LD parser, it’s been a source of maintenance issues for ages. It’s only used by two downstream SDKs to get fairly standardised files (.well-known or parsing out a few fields that are required in a Verifiable Credential), these are technically JSON-LD, but no one is really going to care if they’re parsed as JSON or JSON-LD, because they’re only wanting a few standard keys from whatever data might exist. I’m arguing we should ditch the weight of JSON-LD here and just parse as JSON, whilst I know it’s technically wrong, it’s also technically expedient, and saves us quite the maintenance troubles — allowing us to later tackle making JSON-LD work well (it also allows us to shave like 20% of the weight off our SDKs). But I’ve definitely had people who know RDF and Semantic Web better than me tell me I’m wrong in that call, but not listen to why I might actually be making the right call for right now.

Another problem I see with most Linked Data tooling is that it expects developers to have a working Java install on their machine, and to understand how to install Java tools, I guess the same could be said for JavaScript tools, but these seem to be more familiar maybe to developers regardless of their language background. I’d personally love to see more tools written in Rust & Go, as to be able to ship a single binary that’s super easy to use & install, without the need for a language environment.

As for the push from Inrupt and others to tackle the commercial enterprise market, it really just comes down to capitalism and money: who has the biggest pockets to fund the development. I don’t think this is the wrong approach, but do agree we need more on the side of non-enterprise involvement; this is also where I see Solid’s major shortcoming: what incentive do I have to keep my data schema standardised and to not make arbitrary breaking changes to the data my application produces? From what I can tell, there’s little commercial value in that: as a company, you want to consume all the data, but you don’t want to be responsible for the data or be constrained by it (data is nuclear waste after all these days (I forget who said that)), and in fact, having your competition be able to consume your data might be a competitive disadvantage. Almost by definition, companies are looking to make a profit, and they’re incentivised to do everything they can to maximise that value; very few companies are altruistic — even with my own company (not Inrupt, something else), I registered it as a for-profit company, despite positioning it as a social good technology incubator.

So I definitely think there’s a lot of hurdles to overcome.

Fediverse wishlist though:

WebIDs for profiles
Sign-in with a WebID / OIDC provider
Standardised tools for content moderation (especially an API protocol to control moderation actions taken) (background: I rewrote the moderation tools in Mastodon back in 2017 after being a volunteer moderator on switter.at and realising how bad they were for communities at scale. This is also why I think the fediverse/global timeline is harmful)