Linked Data: Undersold, Overpromised?

Hi Arnold and everyone, creator of m-ld here. Thanks for the shout.

I’m coming at this topic from the point of view of a couple of decades in scientific data management – where sophisticated knowledge management, of the kind that may be possible with Linked Data, could provide a huge boost to scientific productivity. We developed a data-linking approach in parallel (ish) with the early days of RDF, which gave us an edge in the market when it came to search, workflows, and reporting. In the end we never transitioned to RDF (though our system was within a hairs-breadth of a 1:1 conceptual mapping), despite many years of championing on my part.

Why?

To cut a long story short, it’s because we didn’t realise we needed it until we had already entrenched our own approach.

I submit to the panel that this is a common problem. I want to build a (social) app to meet a pressing customer need. I take a platform off the shelf and hack together a prototype. I probably have JSON as both a serialisation between distributed components and a readable way to communicate between humans. My prototype is well-received and I get a hundred stories to take me to MVP. Along the way, some suspiciously tricky requirements arise: like internal and external cross-links, a faceted search UI, custom fields. Each of these is addressed with increasingly complex (and, I will stress, fun to invent) solutions involving metadata and query APIs. At no point in this path (or indeed, in the decades to follow) is there any breathing room to take a hatchet to all these custom solutions.

Today I find myself building a software library that will help developers to solve another, parallel, hard problem, which is sharing of live state information allowing multiple concurrent editors. RDF data structures are not easy to make live-sharable (article), but I’ve based m-ld on RDF for the natural extensibility (link to paper). I also know that when m-ld is used in real apps, linked data principles are going to be needed, and they’ll give m-ld an edge in an increasingly competitive space.


PS

That’s the plan :100:. I’m not actually trying to be a lite version of LD. I use LD as my base data representation.

The main place where I may appear to be inventing standard 15 is with json-rql, which is conceptually a mid-point between GraphQL and SPARQL. However in reality it’s just a serialisation for SPARQL, with a lower barrier to entry. Happy to talk more about that, of course.

2 Likes

I started working with Linked Data and RDF about a year ago.
I’ve been an IT freelancer for > 20 years, but I didn’t study and don’t have a diploma, master’s degree or anything like that.
I admit, at first I had a hard time understanding triples / linked data. But that was not because of the complexity, but because of my rigidity in the old way of thinking!
and for me definitely json-ld. i find it mega confusing! I would recommend everyone to learn turtle first!

Triples are very simple and everyone works with them. each object has an id, properties and values. Only id, property and value there are no links. This brings some complexity into play, but this is where you find the great advantages.
It gets really complicated when it comes to the definition of onthologies, shacl, reasoning. I still have a lot to read, understand and experience.
But not everyone has to understand and be able to apply these complex topics.

I had looked at SOLID, but at the time it didn’t quite meet my requirements. and it was still too much in development. another hurdle for me: it’s more of a solution used by a browser in JS. and JS isn’t my language at all.

With my rdf-pub implementation, I want to create a C2S interface that makes it easier for clients to participate in fedivers. posting json-ld should be easy, but the response may be a json-ld that the client finds frightening. But then you can work with adapters/translators if necessary.
I think we need more adapters/translators. we are talking about “ACTIVITY” pub and for me an activity is a synonym for event. and therefore for me activitypub / fediverse is an event driven architecture. and it is quite normal that there are adapters/translators !

btw. i have paid work again and will have less time to take care of linkedopenactors and rdf-pub. so rdf-pub will not learn S2S for now.

1 Like

I’ll pop this on here for now

It’s something that came out of a discussion with a friend. Still a one-pager that is an early work in progress.

And not really a use case or a library, yet. Manu Spory of JSON-LD did however take a quick look at it, and said positive things.

But hopefully some food for thought on how a simpler path to JSON-LD could look.

1 Like

I just wrote a very long post to the Solid companion thread (but I added a TL;DR too :wink: ). What you say, @gsvarovsky, rings very true. The kind of interoperability levels that we want to achieve are ultra-hard and it is all-too-easy and from pure practical, short term needs, to end up with something much less versatile than inititially intended.


Rephrasing my summary given to Solid to apply here on the SocialHub and the Fediverse, imho we can only hope to tap into the full potential of ActivityPub et al if we:

  1. Recognize the importance of the social aspects of software development. The levels of collaboration that are needed to create highly interoperable software.
  2. Focus on the processes that are needed to do so and continue to streamline them, instead of diving directly into code and deeply technical matters.
  3. Make it as easy as possible for people to tag along to our processes, to interact socially and contribute their bit, even when they lack the technical expertise to help with the low-level stuff.

Here’s my summary regarding how I feel about Solid project and some recommendations:

[Solid Project] Summary

  • Solid project seems most focused on a Grand Vision where its success hinges on tackling enormous complexity and widespread adoption of standards and ecosystem projects.

  • There’s no gradual path to adoption with stable well-productized milestones all along the roadmap and options to choose from. No way for people to get acquainted with Solid without going all-in and face the brunt of the complexity.

  • There’s little focus on the process of software development, how Solid fits, and what benefits it brings in the short term (i.e. before the Grand Vision is realized).

  • While there is a deep technical focus, there seems to be almost a business myopia, as to all the process and design best-practices that are also needed to create interoperable apps that satisfy people’s needs. Social aspects of development are neglected.

Recommendations:

  • Focus on all the practical things that help average developers leverage Solid technology right now in their apps. Tools, documentation, different langauges supported, etc. And with these people now invested in Solid, entice them towards deeper community involvement.

  • Ensure that not just tech is covered, but that Process is as well. How do I design my app with reasonable expectations for interoperability? How can and should I collaborate with others, and what organization structure and tools can we offer to help with that?

It is noteworthy that unlike Solid, we at Fediverse don’t seem to have a Grand Vision. We are satisfied to bolt additional features on top of existing Microblogging concepts and look in each other’s codebase to see how we may slightly integrate with an app.


Interconnectivity

Lastly I want to mention that the Grand Vision of broadscale seamless semantic interoperability that Solid wants to achieve, has a high risk of never coming to fruition. The high complexity being the key factor in that.

So I was delighted when I heard the term “interconnectivity” for the first time. It is a perfect companion to interoperability. It was @steffen that introduced me to it with a toot announcing:

See also: Interconnective networks: open development starts today!

2 Likes

Nice post

If we’d done away with RDF and RDF/XML and just standardized on JSON w/ URLs we probably would have saved ourselves 10 years

1 Like

I want to copy part of a discussion I had on Matrix, so as to ‘archive’ it. I guess my overall point is this:

Without well-defined processes and the community organization and tools to facilitate the social interaction to keep them going, an ecosystem will only evolve by Ad-hoc Interoperability and will suffer the downsides - stalling or exponential complexity - from that in the long term.

openEngiadina matrix chat …

On the openEngiadina matrix room I asked the following question:

There is an, I think, interesting discussion on Solid forum ‘Is RDF “hard”?’. It boils down to that no matter how you turn it Interoperability is really hard, especially ‘semantic interoperability’ (expressing universal semantic meaning). For Solid community I recommend highlighting more of the Process side of software development and its Social aspects besides the deep technical focus.

In this regard I am curious to hear how openEngiadina envisions the semantic network to be built over time when it is in production. I guess the focus on local knowledge already solves a lot of the problem, and with UI widgets tailored to RDF constructs you can make it easier to crowdsource content aggregation. But still there’s a lot of different knowledge to be modeled and potentially reused across platforms.

To which @pukkamustard gave this answer:

I completely agree. Interoperability is hard, especially semantics.

Agreeing on semantics is the same as agreeing on a certain world view and classification of concepts. It is an intrinsically social endeavor that requires understanding how people think and conceptualize thing. Then you need to find a basis that everybody agrees on that is simple enough to be formalized. It’s not easy. In fact, I think trying to find a universal semantic that works for everybody is futile. Luckily, we don’t need universal semantics.

In my opinion, the most important thing in RDF is the open-world assumption - the fact that you never have complete knowledge over anything. If you hold a piece of RDF data and think that is represents a box, you can not assume that other people also think that the piece of data represents as a box - to them it might represent a musical instrument. This might be because other people have different data on the thing or because they have a different semantic understanding of the thing.

Even if two parties have two completely different understandings of something, the thing might still be described with properties that both parties understand. For example the thing might be annotated with a geographic position using some semantics (vocabulary) that both parties understand (e.g. W3C Semantic Web Interest Group: Basic Geo (WGS84 lat/long) Vocabulary). So even if two parties can not agree if the thing is a box or a musical instrument they might still be able to agree that the things is located at a certain place.

A contrived example, but I hope this shows that RDF does not require us to agree on a universal semantics. If we share partial semantics we can already go very far. Luckily there is already an established and rich collections of specialized semantics/vocabularies that we can re-use (e.g. DC Terms, Geo, ActivityStreams, The Music Ontology). These can be used to make the common basis of understanding larger.

In terms of openEngiadina and local knowledge: Local Knowledge is not only the representation of physical things that are bound to some locality. It is also a conceptualization of things - a semantic - that is shared in a locality. The way I envision things to grow: Start with your own small local semantics and grow by finding larger common semantics.

And my response:

Thank you for this elaboration pukkamustard :pray:
Some time ago I came upon some articles by Kevin Feeney of TerminusDB.com In one article he’s advocating to (at least start with) having closed world interpretations of semantic content:

However, if I have a RDF graph of my own and I want to control and reason about its contents and structure in isolation from whatever else is out there, this is a decidedly closed world problem

And in the next part he elaborates on some big issues with Linked Data:

the big problem is that the well-known ontologies and vocabularies such as foaf and dublin-core that have been reused, cannot really be used as libraries in such a manner. They lack precise and correct definitions and they are full of errors and mutual inconsistencies [1] and they themselves use terms from other ontologies — creating huge and unwieldy dependency trees.

Just now I am discussing this with @melvincarvalho. I feel that when we develop apps we naturally go from closed world vocabularies, and that the incentives to keep these interoperable enough with other apps that are out there, should be taken care of by good Processes and Social interaction.

Not taking these into account will mean we either go towards - what I call - Ad-hoc Interoperability (what the fedi has: learn from code and create your own flavour of interop from that), or striving for Universal Interoperability and all the complexity that comes with that.

@pukkamustard reply:

However, if I have a RDF graph of my own and I want to control and reason about its contents and structure in isolation from whatever else is out there, this is a decidedly closed world problem

I agree with the premise. I don’t agree that means we need to throw out the open-world assumption. I think a solution is that we need to be able to decide finely what data is used for reasoning and what not.

I feel that when we develop apps we naturally go from closed world vocabularies

I agree and really think this is what we need to change. We should start thinking open-world by default.

@how’s follow-up to that:

I guess there’s a question of perspective: on the one hand, the ‘closed world’ perspective makes sense when you’re manipulating your own data. But when other do manipulate your data, and when you describe your vocabulary, and when you conceive data usage, you should always leave space for other things to happen, i.e., consider the open world assumption.

And finally my response…

I once saw a toot by @dansup saying something like (paraphrasing): “Oh, it is so nice to be working on new federated features that do not yet exist, as I can just “invent” what I need on-the-fly”. A completely understandable notion. Focus on the own app’s needs, not having to draft a spec, negotiate with others. There are no others yet. Most fedi devs are open to make changes later if that increases interop opportunities. Unless they became too deeply invested on a certain way of doing things and unwilling to change. If they are a dominant project in their domain, then that will become an issue.

I agree and really think this is what we need to change. We should start thinking open-world by default.

Yes, I think so too. But the reality is that this takes a significant extra effort, and initially this rests with the dev who 1st needs a new AP extension but is mostly interested in delivering a good MVP app at that time. The potential win-win of going the extra mile is somewhere in the future, when others want to interoperate.

We should streamline the process of ‘offering an extension’ to others as much as possible, lowering the barrier, and include the steps in which it can be iterated and mature for more open world application.

Right now the practice shows there’s too little incentive to engage in the process, and ad-hoc interoperability is the only way forward.

I am currently testing my rdf-pub implementation, in which I would like to implement a “karte von morgen” KVM (map of tomorrow) initial import and a one-way sync (KVM->rdf-pub).
And here the “open world principle” is blocking me at the moment, so that I am losing some motivation. adapting a closed world to a linked open one is no fun. at the moment I am taking a break :wink:

1 Like

So your are seeking common vocabulary terms to replace an internal entity-relationship model or property graph?

No, thats already done: Specification of Linked Open Actors (LOA)

There are a lot other questions regarding versioning, linking…

I assume that in KVM a place has an attribute of an address and does not point to an address. If the attributes of the place change but its address does not, a copy of the address is created.
This is how it looks in any case when you look at the rest-api.
an address also seems to be duplicated if there are two organisations in one place. these considerations and possibilities of mapping are currently giving me sleepless nights.

1 Like

Agree with this!

Open World Assumption means “anyone can say anything about anything”

And “no one knows everything about anything”

What is means is that data is always partial, and can be mashed together to form knowledge bases

It’s like the difference between “opinions” and “facts”

As you can imagine it’s a very useful tool in some use cases, but adds complexity in others

This is more about the data that you write, than about open vs closed vocabs. I think all our vocabs are open, really

Open vs Closed is different to local vs global

Let’s take facebook as an example. That’s a closed world. I cant make a friend on mastodon

Mastodon is kind of half open / half closed. It lets you do some things and not others. I can make a friend on the fediverse (or follow them). But I cant make a friend on facebook.

In solid you have the open world assumption so I can link from my profile to a facebook account. And pull in data. I’ve done this in the past. So it embraces the open world assumption.

I can also say that my facebook id is linked to my profile, even if facebook doesnt let me, that claim could be sourced on my home page. So the software now has to collect all the different claims and work out fact from opinion. Valuable in the right places, complex in others.

2 Likes

The stupid thing is that in my case the data should really be available as open linked data. That is my goal. Only the path is a bit hard at the moment. I lack experience. and I have the feeling that only very, very few people have this experience.
However, I see LOD as a valuable goal that we should all strive for.

1 Like

Yes, that is an issue I also have. We should try to improve that situation and get some real Linked Data interactions going on our parts of the web. The Solid community is also de-facto the Linked Data community, but the real experts are loathe to mingle on the forum. The Discord channels (and maybe all those boards they have) may be a way to get some help.

Also I’d like to mention once more the delightful-linked-data curated list I maintain. If you have any resources to add I’d be glad to hear. The better the list, the more I can promote LD on the fedi and beyond. So whenever you bump into something… leave it in an issue (there’s one open for just that)

I dont think there actually any real linked data ‘experts’. And anyone that calls themselves one, probably isnt

The solid discourse area was set up by a commercial entity. From my experience, they are reluctant to help out or support people using solid. The vibe in this forum better IMHO

Fundamentally consider a programming language where every variable you use MUST be a URL, and SHOULD link to another quite complicated page of meta data for that URL. And where the only data structure you are allowed to use is a Set. Arrays are an after thought, shoe-horned in, that no one understand. This programming language does not allow things like addition without specialist servers with atomic updates, which still have not yet been built

That’s the state of linked data

It’s certainly useful in some situations. But to say it’s useful in all situations is wrong. That’s the mistake that Linked Data “experts” make. They make promises they dont understand, and for the most part, dont even use. Then when stuff breaks, there’s often no one to help.

LD should be used to solve a narrow set of problems. Such as merging data from different websites. Or in scenarios where links are under used and high value

At the time of making ActivityPub the idea was that if everyone used this ‘standard’ it would be possible to create a rich network effect through interop. Even with the limitation. Well, understandably developers struggled with the limitations, and some rejected them. Rather than to keep pushing linked data, where the earth has been salted, a better way is to accept its usefulness in some situations, explain it, and also accept the limitations

Linked data should be viewed as a variable scope. One higher than global variables. Then programmers have a range of tools to achieve their goals.

Edit: A possible solution.

  1. Recognize JSON-LD as a form of linked data, which has a syntax for representing hyperlinks, for representing things, types, and some common properties
  2. Recognize JSON as a super set of JSON-LD, with all the features, plus more on top, such as Lists or Arrays of typed things
  3. Match slow changing vocabs, to slow changing software, and allow new types of innovation and interop through JSON
2 Likes

https://linkedopenactors.org/#introduction-to-the-concepts

What do you mean with “the forum” ?

@naturzukunft I already have LOD on the list of candidates to add. I am behind on README maintenance, but keep adding entries to the issue. With the “the forum” I refer to Solid community forum where hardly anyone from the core team or Inrupt seems to really want to interact.

@melvincarvalho thank you for that elaboration. Some good food for thought for me there.

I did not want to draw attention to LOA, but to the links I collected in the linked chapter.

1 Like

Just bumped into a listing of various ways to serialize RDF Linked Data:

Copying the summary:

TL;DR

  • Use Hex-Tuples if you want high performance in JS with dynamic data.
  • Use JSON-AD if you don’t have to support existing RDF data, but do value JSON compatibility and type safety.
  • Use HDT if you have big, static datasets and want the best performance and compression.
  • Use N-Triples / N-Quads if you want decent performance and high compatibility.
  • Use JSON-LD if you want to improve your existing JSON API, and don’t need performant RDF parsing.
  • Use Turtle if you want to manually read & edit your RDF.
  • Use Notation3 if you need RDF rules.
  • Use RDFa to extend your existing HTML pages.
  • Use RDF/XML if you need to use XML.
  • If you can, support all of them and use content negotiation.

This is a very nice piece.
But I think in a federated world we can leave out a bit:

  • Hex-Tuples (draft) is the format by the writer and ‘high performance’ means billions and data is static in our case, also nobody uses NDJSON yet.

  • JSON-AD solves what AP already solved, Atomic Data (see also ‘Advocacy’ later)

  • HDT – probably billions and billions like in a twitter world

  • RDF/XML - cause I can’t think of plain XML use cases


So,

  • JSON-LD is the default anyway

and then

  • “Turtle if you want to manually read & edit your RDF.” This includes e.g. manually reading and editing of the vocabulary used in the fediverse but ‘before JSON-LD’
  • “N-Triples / N-Quads if you want decent performance and high compatibility”
  • “RDFa to extend your existing HTML pages” (e.g. w. AP objects, schema or mf2) but ‘after JSON-LD’.

What is left out here is Advocacy.
This is why we can also use ActivityStreams itself for Tuples (double and triple) in the form of Profile and Relationship

Usually 1 software developer defines the @context but neither ActivityPub Instances/Groups nor Users.
To keep the @context small and let everyone “extend ad-hoc”, we can use Profile and Relationship as attachment.
The benefit is that each edge can be a reusable public ActivityPub Object owned by anyone.
Profile could say Alyssa:Portrait describes Bob:Bob – or
Relationship could Alyssa:Alyssa wdt:director Universal:NextBigThing or whatever.

Hextuples IMHO is overkill and lots of technical debt which are a source of bugs

Main thing that’s needed for a social web is JSON with a standardized way of expressing hyperlinks. In JSON-LD that’s using @id or id as a key

Big issue with RDF is that it’s not compatible with plain old JSON, as that has yet to be standardized. There’s not really a will to do it, so we are stuck with hextuples.

However if the AP community got together we could do that for the social web. What would be required would be a way to take plain old JSON keys and put them in a triple store (which requires URs).

Something like:

key <–> URI

foo <–> json:key:foo

Apologies for the extremely delayed reply, I missed this back in January, as I was out of action that month.

Anyway, hey, hi! I’m Emelia, I currently work on the developer tools team over at Inrupt (the company founded by timbl to build Solid for the enterprise).

This thread strikes a cord with me, as I’ve been doing my best to learn RDF & Linked Data over the past year. I don’t come from an RDF, Linked Data or Academic background; instead my background is in working with startups (~23 to date) to build out their products, and in building tools & platforms for developers to be as productive as possible. I’ve also significant experience with building both frontend and backend applications, and a wealth of production knowledge on GraphQL and Node.js.

I could not agree more that the biggest hurdle developers will face in adopting Linked Data is it’s background in RDF.

Mainstream developers typically choose the option with the least friction possible, they want the API to work and work how they want it to, without having to learn for years to feel proficient in what they’re doing. In reality, code and standards are like a barrier between them and their paycheck & getting home at a reasonable time, and keeping their product manager off their back as to why something isn’t done yet.

And that’s coming from someone who’s usually working with more modern tech stacks (react, redux, graphql, json APIs, webpack, nest.js, typescript, etc). I’ve spent a bunch of time trying to help developers towards actually using schema-driven APIs, and it’s always a challenge, you’ll often have different teams wanting to do things “their way” rather than working in unison & trusting that the other team knows what they’re doing.

In previous conversations, I’ve kinda jokingly said “RDF is highly structured, schemaless, garbage that you may find useful data in”, by which I mean, you’ve no guarantees with what you get in RDF, you might get a single value, multiple, or none at all, you might get a certain “field” (predicate), but you may also not.

This is very different to how most developers are used to working where by they can make assumptions that their API returns back data in the way they want, or if not, they hack through whatever “crap” the backend developers have sent them to make their applications “work” and satisfy the product requirements.

This is where a lot of developers find frustration in working with TypeScript and data coming from APIs, because JSON.parse returns an any rather than the structure that they think they know they are receiving from the API. It’s quite rare to see developers actually validate and check the shape of the data they’ve gotten back from their API — the most you usually see is a response.status === 200 check, and then just response.json() with maybe a try/catch around the asynchronous network call.

Even just getting developers to write consistent GraphQL schemas is challenging, and often requires a lot of training & guiding towards helping them understand what they’re working with, and how it can help them, instead of fighting the tool or library.

There are newer approaches that are starting to improve the situation, such as Zod, GraphQL-Codegen, JSON-RPC or tRPC, and some JSON Schema tooling, but adoption is only just barely happening, there’s a lot of practices to unlearn.

I also know in the Linked Data world we’ve ShEx and Shacl, both of which try to enforce at least some sort of schema on top of RDF data. I definitely think the “killer” platform for getting developers to adopt RDF, Linked Data, etc, is going to be one that takes them from their practices today, and bridges the gaps & hides the complexities. For instance, this might be using GraphQL schemas & operations (queries & mutations) in their Schema Definition Language, to generate essentially an API for their data that they want — Jesse Wright, a coworker at Inrupt, actually has a prototype . Another option might be a tool to transform JSON Schema’s into Shacl/ShEx, or do similar code generation as previously described.

But I don’t think majority of developers will seriously work with lower level interactions with Linked Data (e.g., interacting directly with JSON-LD, Turtle, Quads, etc). Those will be more used by tooling developers or experts who know what they’re doing (or beginners who don’t yet know what they’re doing at all, and make many mistakes due to not knowing better).

I hope this can hopefully add to the conversation here, and perhaps a different perspective. (And hopefully no one is highly offended by my joking description of RDF, I’ve just seen far too many people struggling to get their heads around it’s quirks, even people who would normally be writing database migration & actually schema driven data)

(these are my own opinions and do not necessarily reflect the standing of my employer)

2 Likes