Linked Data: Undersold, Overpromised?

aschrijver · January 27, 2022, 8:49am

I’ll copy a post I just created on the Solid forum to this topic: Is RDF "hard"? - Solid Community Forum

I feel that something very crucial is touched upon here.

Since the early days of the Semantic Web I’ve felt - like other proponents of the technology - that linked data would allow applications to be taken to a new, higher level than was possible before. And I was eagerly awaiting all the exciting uses that would become available over time. Though there’s some very prominent and successful applications since then, like Google Knowledge Graph, OpenGraph and a couple others until now imho the tech has undersold and overpromised. I am still fervently hoping for it to take flight and get more widespread adoption.

One of the problems I think are in the realms of ‘productization’. From the very start the technology has not been easily accessible, not developer-friendly, the true benefits not immediately obvious. Always the answer to questions people like @dynamoRando are asking is something like “If only you take the deep-dive, go through the rabbit hole, you will see the light and the world will be your oyster”

Most applications I’ve seen to date are firmly in the academic world, or they are for the vast majority fully tech-focused, UX/UI comes later, and product website, good documentation after that.

Linked data / semantic applications should “sell” themselves better to the outside world. That so little of that is happening is imho a very inhibiting factor that slows the evolution of the entire field. When technologists don’t get it, they move own to lower-hanging fruits, greener pastures. When diving into linked data most of what you find stems from the Semantic Web hype that has long past and where link rot is setting in. There’s great developments still - like Solid - but most of the information is at experts level. Insiders of the field. For those that ventured into the rabbit hole before.

@jeffz wrote:
Therefore there are now many Solid libraries that abstract away from RDF

This is also the case for the Fediverse where federated app developers treat ActivityStreams / ActivityPub formats as plain JSON and just throw a @context in it at the last moment for good measure and to comply to the spec. With apps evolving like that the Linked Data parts of the story become a harder sell all of the time. And that saddens me, as I believe in its potential.

I’ve said this before on this forum. Much of the Solid Project seems to aim at professional / corporate / commercial adoption first and foremost, and though that strategy might be a good way to come to widespread adoption, my personal impression is that new technologies find adoption fastest if they have a high appeal to the developer community. And that would entail more focus on the community aspects and cross-community collaborations than there exists now (as far as my awareness is).

melvincarvalho · January 27, 2022, 2:07pm

15 years of Linked data experience here

It’s a useful tool, with trade-offs, is about what I can say

To know it well is a multi year learning curve. The complexity is very high compared with other technologies. ActivityPub inherited, some but not all, of the benefits, and quite a good chunk of the complexity

I would agree it is expert level technology, which suits a certain audience, mainly academia / PhDs and the some enterprises

Most of the time linked data is pitched via appeal to authority. Just yesterday a grass roots developer said to me about LD:

I hope [they] can popularize linked data without the smugness

I’d prefer it if the benefits were told. In Linked Data, everything is a Set. That makes merges cheap, but other things like arrays or arithmetic, hard. I’d actually say more than hard, impractical. However, in JSON you can do both of these things quite easily

As someone that coined the phrase “Social Linked Data” (which became Solid) my main goal was to achieve a distributed social web to compete with incumbents such as facebook, but open and where the users own their data. We’re still a long way from that.

We need lite versions of some of these technologies. Take the useful parts, but leave behind the complexity and technical debt. I suppose it’s a work in progress …

aschrijver · January 27, 2022, 4:30pm

Thank you Melvin!

So part of the challenge is crossing the chasm between expert-level and ‘mainstream’ development. I’ve expressed that on Solid forum before, i.e. that I’ve felt there was a sort of fascination to ‘reinvent’ or ‘rewrite the web’ from scratch and suggesting all kinds of end-user apps as ‘killer apps’ for the technology, there should be more focus on providing tools & libraries in the developer toolbox that encapsulate the complexity in better ways so the technology comes withi grasp of more people. See:

Yes, this is a deeply inspiring notion to me, and while we are still a long way away from that as you say, it is also what the SocialHub slogan of “Social Networking Reimagined” is about. And I feel a huge opportunity that exists now to make a fist against Big Tech with our own definition of how “social” ought to be.

I received a great response on the Solid forum where I first posted my reaction, and where the author also mentioned:

EDIT: I guess what I’m saying is there needs to be more product champions.

EDIT EDIT: Sorry, the coffee is kicking in. To me, the question becomes for any developer sitting down at their keyboard when starting a project: Why should I use Solid and develop my app to use Solid principles over what I already know? Using the MEAN stack, or SAFE stack, or any other stack really? In my view, if you can convince the general conversation of: I really should be writing my app in a “Solid” manner most of the time, then we’ve reached our goal. (And by Solid, I’m abstracting here to really mean: leveraging socially aware protocols, such as the Semantic Web, etc.)

(Btw, dunno if ‘product champion’ is the word I’d use. Too corporate feeling, but I get the idea).

Similar question we can also ask for the Fediverse in general. Why should a fedi dev take the extra effort to deep-dive into walking the Linked Data path, rather than just slapping a @context in place?

melvincarvalho · January 27, 2022, 5:32pm

Personally I think there is a middle ground, where you have a lite version of Linked Data, with a full upgrade path for those who want it

EDIT: what would a lite version look like?

compatible with plain old JSON
allowing syntax for links/URIs in JSON
additional ability to add @id
ability to add @type
path to full RDF compatibility

aschrijver · January 27, 2022, 7:12pm

That is a good middle ground, yes. What I am also very interested in is all the ways that ease the creation of vocab extensions that facilitate this “path to full RDF compatibility” optimally. In other words the whole developer experience around “methodology & process” that on one hand ensures that developers can scatch the itch to quickly iterate on their own app, while on the other hand staying firmly on the “interoperability & standards” track and ramping up to more powerful semantic “social fabric” in the future.

I feel there are very little tools and (recent) best-practices to help them in this regard. Though there are a couple of ActivityPub ‘frameworks’ that are well positioned to offer a proper support platform, like @cjs #software:go-fed @ivan and @mayel’s #software:bonfire @pukkamustard #software:openengiadina @naturzukunft #software:rdf-pub and semapps.

It would be lovely if we could come to a more appealing proposition together, and attract more people to the ecosystem with that.

aschrijver · January 27, 2022, 8:57pm

Just added a comment to the Soild topic highlighting some RDF criticism I just bumped into on HN at: The Block Protocol | Hacker News

aschrijver · January 28, 2022, 6:22am

A post was split to a new topic: Atomic Data: Easy way to create, share and model linked data

aschrijver · January 28, 2022, 7:09am

Interestingly alongside this discussion in quick succession we have found 5 protocols already that dedicate to being lite versions of Linked Data:

An Introduction to The Nostr Protocol posted by @melvincarvalho
The Block Protocol posted by me
Atomic Data: Easy way to create, share and model linked data posted by @SelectSweet
Meld (m-ld): Live information sharing maintained by @gsvarovsky
RDF-DEV Community Group working on EasierRDF and RDF-star open standards posted by me

They all look like great initiatives to me, with a focus on practical application, reducing linked data complexity. There are parts that are complementary in them, and other parts that are overlapping, different ways of doing things. They all relate to underlying (W3C) linked data standards in different ways. Which bringes me to the most quoted XKCD cartoon:

But maybe this is not applicable at all because of sufficient linked data standards compliance of each of these protocols, and they all just serve to give developers more choice in how they design their apps without entering competing ecosystems by doing so.

At least I think that each of these projects would be best served to keep a wary eye on providing that compliance and retain a good level of interoperability.

Probably I should start collecting these protocol projects in the delightful project, and any PR’s / Issues are most welcome:

Update: Interesting follow-ups on the Solid companion thread being posted…

Generate UI components from RDF: GitHub - jeff-zucker/solid-ui-components: generate high level HTML widgets from RDF data
TerminusDB’s TerminusX cloud service for building linked data apps (proprietary addon to OSS db, but conceptually interesting)
LinkML general purpose modeling language for linked data

gsvarovsky · January 28, 2022, 9:00am

Hi Arnold and everyone, creator of m-ld here. Thanks for the shout.

I’m coming at this topic from the point of view of a couple of decades in scientific data management – where sophisticated knowledge management, of the kind that may be possible with Linked Data, could provide a huge boost to scientific productivity. We developed a data-linking approach in parallel (ish) with the early days of RDF, which gave us an edge in the market when it came to search, workflows, and reporting. In the end we never transitioned to RDF (though our system was within a hairs-breadth of a 1:1 conceptual mapping), despite many years of championing on my part.

Why?

To cut a long story short, it’s because we didn’t realise we needed it until we had already entrenched our own approach.

I submit to the panel that this is a common problem. I want to build a (social) app to meet a pressing customer need. I take a platform off the shelf and hack together a prototype. I probably have JSON as both a serialisation between distributed components and a readable way to communicate between humans. My prototype is well-received and I get a hundred stories to take me to MVP. Along the way, some suspiciously tricky requirements arise: like internal and external cross-links, a faceted search UI, custom fields. Each of these is addressed with increasingly complex (and, I will stress, fun to invent) solutions involving metadata and query APIs. At no point in this path (or indeed, in the decades to follow) is there any breathing room to take a hatchet to all these custom solutions.

Today I find myself building a software library that will help developers to solve another, parallel, hard problem, which is sharing of live state information allowing multiple concurrent editors. RDF data structures are not easy to make live-sharable (article), but I’ve based m-ld on RDF for the natural extensibility (link to paper). I also know that when m-ld is used in real apps, linked data principles are going to be needed, and they’ll give m-ld an edge in an increasingly competitive space.

PS

That’s the plan . I’m not actually trying to be a lite version of LD. I use LD as my base data representation.

The main place where I may appear to be inventing standard 15 is with json-rql, which is conceptually a mid-point between GraphQL and SPARQL. However in reality it’s just a serialisation for SPARQL, with a lower barrier to entry. Happy to talk more about that, of course.

naturzukunft · January 28, 2022, 12:20pm

I started working with Linked Data and RDF about a year ago.
I’ve been an IT freelancer for > 20 years, but I didn’t study and don’t have a diploma, master’s degree or anything like that.
I admit, at first I had a hard time understanding triples / linked data. But that was not because of the complexity, but because of my rigidity in the old way of thinking!
and for me definitely json-ld. i find it mega confusing! I would recommend everyone to learn turtle first!

Triples are very simple and everyone works with them. each object has an id, properties and values. Only id, property and value there are no links. This brings some complexity into play, but this is where you find the great advantages.
It gets really complicated when it comes to the definition of onthologies, shacl, reasoning. I still have a lot to read, understand and experience.
But not everyone has to understand and be able to apply these complex topics.

I had looked at SOLID, but at the time it didn’t quite meet my requirements. and it was still too much in development. another hurdle for me: it’s more of a solution used by a browser in JS. and JS isn’t my language at all.

With my rdf-pub implementation, I want to create a C2S interface that makes it easier for clients to participate in fedivers. posting json-ld should be easy, but the response may be a json-ld that the client finds frightening. But then you can work with adapters/translators if necessary.
I think we need more adapters/translators. we are talking about “ACTIVITY” pub and for me an activity is a synonym for event. and therefore for me activitypub / fediverse is an event driven architecture. and it is quite normal that there are adapters/translators !

btw. i have paid work again and will have less time to take care of linkedopenactors and rdf-pub. so rdf-pub will not learn S2S for now.

melvincarvalho · January 28, 2022, 12:41pm

I’ll pop this on here for now

It’s something that came out of a discussion with a friend. Still a one-pager that is an early work in progress.

And not really a use case or a library, yet. Manu Spory of JSON-LD did however take a quick look at it, and said positive things.

But hopefully some food for thought on how a simpler path to JSON-LD could look.

aschrijver · January 29, 2022, 5:43pm

I just wrote a very long post to the Solid companion thread (but I added a TL;DR too ). What you say, @gsvarovsky, rings very true. The kind of interoperability levels that we want to achieve are ultra-hard and it is all-too-easy and from pure practical, short term needs, to end up with something much less versatile than inititially intended.

Rephrasing my summary given to Solid to apply here on the SocialHub and the Fediverse, imho we can only hope to tap into the full potential of ActivityPub et al if we:

Recognize the importance of the social aspects of software development. The levels of collaboration that are needed to create highly interoperable software.
Focus on the processes that are needed to do so and continue to streamline them, instead of diving directly into code and deeply technical matters.
Make it as easy as possible for people to tag along to our processes, to interact socially and contribute their bit, even when they lack the technical expertise to help with the low-level stuff.

Here’s my summary regarding how I feel about Solid project and some recommendations:

[Solid Project] Summary

Solid project seems most focused on a Grand Vision where its success hinges on tackling enormous complexity and widespread adoption of standards and ecosystem projects.

There’s no gradual path to adoption with stable well-productized milestones all along the roadmap and options to choose from. No way for people to get acquainted with Solid without going all-in and face the brunt of the complexity.

There’s little focus on the process of software development, how Solid fits, and what benefits it brings in the short term (i.e. before the Grand Vision is realized).

While there is a deep technical focus, there seems to be almost a business myopia, as to all the process and design best-practices that are also needed to create interoperable apps that satisfy people’s needs. Social aspects of development are neglected.

Recommendations:

Focus on all the practical things that help average developers leverage Solid technology right now in their apps. Tools, documentation, different langauges supported, etc. And with these people now invested in Solid, entice them towards deeper community involvement.

Ensure that not just tech is covered, but that Process is as well. How do I design my app with reasonable expectations for interoperability? How can and should I collaborate with others, and what organization structure and tools can we offer to help with that?

It is noteworthy that unlike Solid, we at Fediverse don’t seem to have a Grand Vision. We are satisfied to bolt additional features on top of existing Microblogging concepts and look in each other’s codebase to see how we may slightly integrate with an app.

Interconnectivity

Lastly I want to mention that the Grand Vision of broadscale seamless semantic interoperability that Solid wants to achieve, has a high risk of never coming to fruition. The high complexity being the key factor in that.

So I was delighted when I heard the term “interconnectivity” for the first time. It is a perfect companion to interoperability. It was @steffen that introduced me to it with a toot announcing:

See also: Interconnective networks: open development starts today!

melvincarvalho · January 29, 2022, 8:48pm

Nice post

If we’d done away with RDF and RDF/XML and just standardized on JSON w/ URLs we probably would have saved ourselves 10 years

aschrijver · January 31, 2022, 8:52pm

I want to copy part of a discussion I had on Matrix, so as to ‘archive’ it. I guess my overall point is this:

Without well-defined processes and the community organization and tools to facilitate the social interaction to keep them going, an ecosystem will only evolve by Ad-hoc Interoperability and will suffer the downsides - stalling or exponential complexity - from that in the long term.

openEngiadina matrix chat …

On the openEngiadina matrix room I asked the following question:

There is an, I think, interesting discussion on Solid forum ‘Is RDF “hard”?’. It boils down to that no matter how you turn it Interoperability is really hard, especially ‘semantic interoperability’ (expressing universal semantic meaning). For Solid community I recommend highlighting more of the Process side of software development and its Social aspects besides the deep technical focus.

In this regard I am curious to hear how openEngiadina envisions the semantic network to be built over time when it is in production. I guess the focus on local knowledge already solves a lot of the problem, and with UI widgets tailored to RDF constructs you can make it easier to crowdsource content aggregation. But still there’s a lot of different knowledge to be modeled and potentially reused across platforms.

To which @pukkamustard gave this answer:

I completely agree. Interoperability is hard, especially semantics.

Agreeing on semantics is the same as agreeing on a certain world view and classification of concepts. It is an intrinsically social endeavor that requires understanding how people think and conceptualize thing. Then you need to find a basis that everybody agrees on that is simple enough to be formalized. It’s not easy. In fact, I think trying to find a universal semantic that works for everybody is futile. Luckily, we don’t need universal semantics.

In my opinion, the most important thing in RDF is the open-world assumption - the fact that you never have complete knowledge over anything. If you hold a piece of RDF data and think that is represents a box, you can not assume that other people also think that the piece of data represents as a box - to them it might represent a musical instrument. This might be because other people have different data on the thing or because they have a different semantic understanding of the thing.

Even if two parties have two completely different understandings of something, the thing might still be described with properties that both parties understand. For example the thing might be annotated with a geographic position using some semantics (vocabulary) that both parties understand (e.g. W3C Semantic Web Interest Group: Basic Geo (WGS84 lat/long) Vocabulary). So even if two parties can not agree if the thing is a box or a musical instrument they might still be able to agree that the things is located at a certain place.

A contrived example, but I hope this shows that RDF does not require us to agree on a universal semantics. If we share partial semantics we can already go very far. Luckily there is already an established and rich collections of specialized semantics/vocabularies that we can re-use (e.g. DC Terms, Geo, ActivityStreams, The Music Ontology). These can be used to make the common basis of understanding larger.

In terms of openEngiadina and local knowledge: Local Knowledge is not only the representation of physical things that are bound to some locality. It is also a conceptualization of things - a semantic - that is shared in a locality. The way I envision things to grow: Start with your own small local semantics and grow by finding larger common semantics.

And my response:

Thank you for this elaboration pukkamustard
Some time ago I came upon some articles by Kevin Feeney of TerminusDB.com In one article he’s advocating to (at least start with) having closed world interpretations of semantic content:

However, if I have a RDF graph of my own and I want to control and reason about its contents and structure in isolation from whatever else is out there, this is a decidedly closed world problem

And in the next part he elaborates on some big issues with Linked Data:

the big problem is that the well-known ontologies and vocabularies such as foaf and dublin-core that have been reused, cannot really be used as libraries in such a manner. They lack precise and correct definitions and they are full of errors and mutual inconsistencies [1] and they themselves use terms from other ontologies — creating huge and unwieldy dependency trees.

Just now I am discussing this with @melvincarvalho. I feel that when we develop apps we naturally go from closed world vocabularies, and that the incentives to keep these interoperable enough with other apps that are out there, should be taken care of by good Processes and Social interaction.

Not taking these into account will mean we either go towards - what I call - Ad-hoc Interoperability (what the fedi has: learn from code and create your own flavour of interop from that), or striving for Universal Interoperability and all the complexity that comes with that.

@pukkamustard reply:

However, if I have a RDF graph of my own and I want to control and reason about its contents and structure in isolation from whatever else is out there, this is a decidedly closed world problem

I agree with the premise. I don’t agree that means we need to throw out the open-world assumption. I think a solution is that we need to be able to decide finely what data is used for reasoning and what not.

I feel that when we develop apps we naturally go from closed world vocabularies

I agree and really think this is what we need to change. We should start thinking open-world by default.

@how’s follow-up to that:

I guess there’s a question of perspective: on the one hand, the ‘closed world’ perspective makes sense when you’re manipulating your own data. But when other do manipulate your data, and when you describe your vocabulary, and when you conceive data usage, you should always leave space for other things to happen, i.e., consider the open world assumption.

And finally my response…

I once saw a toot by @dansup saying something like (paraphrasing): “Oh, it is so nice to be working on new federated features that do not yet exist, as I can just “invent” what I need on-the-fly”. A completely understandable notion. Focus on the own app’s needs, not having to draft a spec, negotiate with others. There are no others yet. Most fedi devs are open to make changes later if that increases interop opportunities. Unless they became too deeply invested on a certain way of doing things and unwilling to change. If they are a dominant project in their domain, then that will become an issue.

I agree and really think this is what we need to change. We should start thinking open-world by default.

Yes, I think so too. But the reality is that this takes a significant extra effort, and initially this rests with the dev who 1st needs a new AP extension but is mostly interested in delivering a good MVP app at that time. The potential win-win of going the extra mile is somewhere in the future, when others want to interoperate.

We should streamline the process of ‘offering an extension’ to others as much as possible, lowering the barrier, and include the steps in which it can be iterated and mature for more open world application.

Right now the practice shows there’s too little incentive to engage in the process, and ad-hoc interoperability is the only way forward.

naturzukunft · February 1, 2022, 9:48am

I am currently testing my rdf-pub implementation, in which I would like to implement a “karte von morgen” KVM (map of tomorrow) initial import and a one-way sync (KVM->rdf-pub).
And here the “open world principle” is blocking me at the moment, so that I am losing some motivation. adapting a closed world to a linked open one is no fun. at the moment I am taking a break

aschrijver · February 1, 2022, 9:51am

So your are seeking common vocabulary terms to replace an internal entity-relationship model or property graph?

naturzukunft · February 1, 2022, 10:03am

No, thats already done: Specification of Linked Open Actors (LOA)

There are a lot other questions regarding versioning, linking…

I assume that in KVM a place has an attribute of an address and does not point to an address. If the attributes of the place change but its address does not, a copy of the address is created.
This is how it looks in any case when you look at the rest-api.
an address also seems to be duplicated if there are two organisations in one place. these considerations and possibilities of mapping are currently giving me sleepless nights.

melvincarvalho · February 1, 2022, 10:24am

Agree with this!

Open World Assumption means “anyone can say anything about anything”

And “no one knows everything about anything”

What is means is that data is always partial, and can be mashed together to form knowledge bases

It’s like the difference between “opinions” and “facts”

As you can imagine it’s a very useful tool in some use cases, but adds complexity in others

This is more about the data that you write, than about open vs closed vocabs. I think all our vocabs are open, really

Open vs Closed is different to local vs global

Let’s take facebook as an example. That’s a closed world. I cant make a friend on mastodon

Mastodon is kind of half open / half closed. It lets you do some things and not others. I can make a friend on the fediverse (or follow them). But I cant make a friend on facebook.

In solid you have the open world assumption so I can link from my profile to a facebook account. And pull in data. I’ve done this in the past. So it embraces the open world assumption.

I can also say that my facebook id is linked to my profile, even if facebook doesnt let me, that claim could be sourced on my home page. So the software now has to collect all the different claims and work out fact from opinion. Valuable in the right places, complex in others.

naturzukunft · February 1, 2022, 3:25pm

The stupid thing is that in my case the data should really be available as open linked data. That is my goal. Only the path is a bit hard at the moment. I lack experience. and I have the feeling that only very, very few people have this experience.
However, I see LOD as a valuable goal that we should all strive for.

aschrijver · February 2, 2022, 7:30am

Yes, that is an issue I also have. We should try to improve that situation and get some real Linked Data interactions going on our parts of the web. The Solid community is also de-facto the Linked Data community, but the real experts are loathe to mingle on the forum. The Discord channels (and maybe all those boards they have) may be a way to get some help.

Also I’d like to mention once more the delightful-linked-data curated list I maintain. If you have any resources to add I’d be glad to hear. The better the list, the more I can promote LD on the fedi and beyond. So whenever you bump into something… leave it in an issue (there’s one open for just that)