We have created documentation on how exactly Lemmy federation works

https://join-lemmy.org/docs/en/federation/overview.html

2 Likes

This is some pretty wonderful documentation. Well done!

I’d like to make an observation on the User object

Lemmy Protocol - Lemmy Documentation → User

There are 2 entities described explicitly, and one implicitly, using the id alias:

  1. "id": "#key"

This is great. The PEM format offers a lot of flexibility. FYI: There is an emerging new standard too in social coding around ssh keys having username.keys.

  1. "id": "https://enterprise.lemmy.ml/u/picard"

This (implicitly) pertains to the HTTP document, so all headers and meta data are tied to this. The created date, the etag etc.

  1. "id": "https://enterprise.lemmy.ml/u/picard"

This is the same as (2) but explicitly it’s related to the fields in the User object. It’s a convenient way to get started, but there is some issue with mixing 2 & 3 as time goes on, as it can be hard for automated agents to know which is which. In solid we provided future proofing of this, and a clear separation of concerns, by adding #me to the User id field. I’d recommend this as a good practice, that will save you pain in the future (been there!)

I think the choice of schemas is quite practical and has a large network effect. I’m personally going to move much more to a JSON (with scattered html) model for schemas and alias them to existing fields. All the various formats on linked data are great for inclusion but harder for a parser. I’m going to move more to a json first approach, both for the self-description and the context.

Great work, in any case, and you’ve given some inspiration to me to make some fedi based services and docs + how to do it.

Two questions:

The second one “ActivityPub API Outline” is an empty html document (here).
Did it move somehow, is it maybe Lemmy Protocol - Lemmy Documentation now?
If so, the links in the first doc should be updated too.

And there might be a typo in the first document:

When a new Comment is created for a Post, both the Post ID and the parent Comment ID (if it exists) are written to the in_reply_to field. This allows assigning it to the correct Post, and building the Comment tree. It is then sent to the Community inbox as Create/Note

The as property reads inReplyTo

Also there are some observations about the “Lemmy Protocol Federation” Doc. which I am summing up here now:

Context

[
  "https://www.w3.org/ns/activitystreams",
  {
    "stickied": "as:stickied",
[…]

See e.g. Activity Vocabulary or https://www.w3.org/ns/activitystreams.jsonld

• So, I wonder about as:stickied - did you maybe mean toot:featured ?
Extensions to the official namespace as (apart from the new “alsoKnownAs”) are documented here Activity Streams extensions - W3C Wiki

The boolean toot:featured is the proposed sticky/pinned post thing and your documentation also says

“True means that it is shown on top of the community”

@nutomic @dessalines Let us avoid duplicates. I am working on a consolidated vocabulary including all Community Extensions.

• The as:moderators Collection should become YOURNAMESPACE:moderators
• “expires” could be as:endTime which is a native property.

Last not least pt is usually the prefix for the namespace of peertube.

1 Like

Yes the link you posted is the correct one now. Unfortunately it seems like I cant edit the original post, probably because it is too old.

I fixed the name of inReplyTo.

About context, the truth is that I dont really understand how it works, neither did I find anyone who does. Lemmy just adds it in case its needed by other software, but objects and activities are parsed as simple json.

The trailing # is easy to add, I didnt know that was significant.

Mastodon’s toot:featured field contains a collection of all stickied objects, while Lemmy sets as:stickied as a boolean directly on the stickied objects, with no collection. So changing that would require some rewrite, which is low priority for me.

You are right that we should probably define our own namespace. The problem is that I dont know how to do that, or to verify that it is valid. By the way, there are also many fields which are not at all part of the context, but those are all optional and can be ignored if you want (same as stickied or moderators).

Thanks for answering fast, I’ve (hopefully) corrected the links in the original post.

About context, the truth is that I dont really understand how it works

What is important: We speak about @context and not context.
The first is a useful underlying property from the JSON-LD specification.

The second is explained in “ActivityVocabulary”;
It is a native ActivityStreams property meant to group things.

anyway

about @context

Without using it, Sir Tim Berners Lee would award Lemmy with 3 of 5 stars:slight_smile:
When you use it, you can earn the 4th star.

The spec. tries to explains @context – recommending to read it in the order:

  1. Section Extensibility in the underlying spec. for “ActivityVocabulary”: Activity Streams 2.0
  2. Section Context in the very underlying spec. JSON-LD 1.1

Let me try: :slight_smile:
It is to “use URIs to denote things, so that people can point at your stuff”
See the benefits
In short:
Any property in the JSON document is not a word but an URI.
We do not want to repeat things, so in the @context field we can

  • define a Base URI for unprefixed properties (it is https://www.w3.org/ns/activitystreams#) unless specified specifically
  • define prefixes which are like “shortcuts” and denoted by :
  • define a property and its behaviour specifically

Then any property becomes a unique URI which can also point to both, a machine and human readable definition for the property. With multilanguage labels as bonus (like in wikidata or redaktor).

Now for example

{ "type": ["adidas:Offer"] } or { "type": ["puma:Offer"] }

can have different specified meanings (and if you see the company history probably have).


Which brings me to

Mastodon’s toot:featured field contains a collection of all stickied objects

You are right, sorry. Let us think it federated.
Both make sense and so you should use your own namespace.

Trying to highlight the differences.
On the one hand when an application shows the Outbox of an Actor (e.g. under the Profile), the mastodon approach makes sense cause you do not want to parse it until the end to know all sticky.
On the other hand when you treat Objects of different Actors, like when viewing your Inbox, “lemmy stickied” is fine to just show e.g. an Icon or “sticky”-label …

Please also note that id and context are aliases itself, specified by the “ActivityStreams 2.0 Terms”.
But the @context itself is independent. Since yours is consistent I could cache it for generator = Lemmy (recommending to use generator property).
But in the @context itself it must still be @id and @type (note the @ !)
The schema namespace is “http://schema.org/” (no “#”, it exactly replaces the “sc:”).
And if you want to alias things from “as”, you need to specify what “as” is.

tl;dr
proposing:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams#",
    "https://w3id.org/security/v1",
    {
      "as": "https://www.w3.org/ns/activitystreams#",
      "lm": "https://join-lemmy.org#",
      "pt": "https://joinpeertube.org/ns#",
      "sc": "http://schema.org#",
      "comments_enabled": {
        "@type": "sc:Boolean",
        "@id": "pt:commentsEnabled"
      },
      "matrixUserId": {
        "@type": "@id",
        "@id": "as:alsoKnownAs"
      },
      "moderators":  {
        "@type": "@id",
        "@id": "lm:moderators"
      },
      "stickied":  {
        "@type": "sc:Boolean",
        "@id": "lm:stickied"
      }
    }
  ]
}

Basically to avoid confusion for people not using JSON-LD/@context, I would rename ‘comments_enabled’ to ‘commentsEnabled’ as in peertube.
And: regarding @type, normally AP uses "xsd": "http://www.w3.org/2001/XMLSchema#" to describe functional datatypes like xsd:boolean.

@nutomic et al.
please do also attend the monthly meetings each 2nd TUE a month. We spoke and speak about all the @context and context things …

1 Like

OT’ish… While the namespacing is an improvement, application-specific namespaces are still non-optimal. Why would Lemmy reference a Peertube namespace to model Commenting features? This relates to the discussion in A namespace for things defined in FEPs and the fact that we never really figured out how to specify AP vocab extension in ways most beneficial for reuse across the ecosystem.

Because it is specified like this here

https://www.w3.org/TR/activitystreams-core/#h-extensibility

I tried multiple times to get @rhiaro in the loop, meanwhile I am stuck too but this is why I would like to at least continue to collect extension although if nobody seems to be interested.

PS: The lemmy namespace is for the newly introduced lm:stickied feature.
The aliases are just there so that you do not need prefixes anywhere in your document, as Melvin and other noted before it is easier to consume.

In general, I agree.
The thing is regarding mastodon and peertube it is too late cause they have already created their own namespaces (which we can only alias …)

But (cc @nutomic ) I would agree that it would be better to start now with a common namespace.
Does @aschrijver and @acka47 want to help?
I have created a huge rdfs/owl/skos turtle file which is collecting everything
It is huge. The minimal version is already 700kb (without all the roles in the shown vocab !) …

What do we need? → Presenting the SkoHub Vocabs Prototype | Skohub Blog
Step 2 would need an exchange/contact of repo owner @aschrijver and @acka47 then and I can push it.

/ edit
also pinging @cpmoser cause https://yuforium.com/ns/activitypub

/ edit2
There is also a problem with the peertube @context, neither as:dislikes nor as:comments does exist.

      "dislikes": {
        "@id": "as:dislikes",
        "@type": "@id"
      },
      "comments": {
        "@id": "as:comments",
        "@type": "@id"
      }

Yes. Though I’d like to restrict to generic procedures for extension and find best-practices for that to document.

Isn’t your SkoHub vocab an example of just one particular example? We should do it on a different topic, and maybe document in a Hedgedoc pad in parallel for the time being (or alternatively have a wiki post + discussion thread).

Well, this was a different SKOS file [just for attributions (Roles) and location]
Worked on all real used terms already.
Not every implementor replied yet and I will finish the uncommented things (e.g. pleroma) after work.

Here is what I have for now
see asSkos.ttl (valid turtle file)

[edit]
When describing @context above, forgot to mention Manus wonderful Intro Video to JSON LD.

So, we would need a repo and a name. And first the SkoHub steps.
Each term has an inbox, we can just write messages to them do discuss. It is all federated.
But we can also publish at next meeting.
Finally I can generate a JSON-LD context and maybe JSON Schema out of the turtle file.

PS: Thanks to everyone who feeded it so far.

To be honest, I’m not really interested in learning how @context works. Like I said, Lemmy doesnt use the field at all, and only sends it for the benefit of other software. It is defined in this file, so I suggest you make a pull request to change it (or I can do that if you prefer).

About featured/stickied items, it is true that the way Mastodon does it makes more sense. Our implementation is simply a reflection of how its stored in the database, because it was much easier to implement that way, and no one has complained so far. You can open an issue to change it.

comments_enabled is just a typo in the context, the actual field is called commentsEnabled.

And I dont really have time to do video chats, for me its very much preferable to talk like this via forum posts, github issues or matrix chat.

2 Likes

Yes, I will make a pull request.
It is really misleading because people already thought that the things like
wrongNamespace:stickied (Lemmy) or
wrongNamespace:dislikes or wrongNamespace:comments (PeerTube)
would do exist in the ActivityStreams namespace.

This is possible too but it must be formally decided in a meeting where a chair of the SocialCG is present.
Just talked with Dr. Amy Guy about it in fedi.
Also there is nothing special about creating your own namespace, it’s just an URI (it must not be an URL), it is really just a name associated with a space.

But people federate them on in the as namespace and others stumble upon it etc. :wink:

1 Like

for the benefit of other software, sending an incorrect @context is actually worse than sending no @context at all – all you really need to know is that in a json-ld aware software, the “plain json” property names derive their namespace from the @context property like so:

  • any URI (like https://www.w3.org/ns/activitystreams) gets fetched for a application/ld+json context document (like https://www.w3.org/ns/activitystreams.jsonld); all properties within that document get added to the understood context
  • aliases can be defined by mapping a prefix property to its expanded form, e.g. "as": "https://www.w3.org/ns/activitystreams#" maps the as: prefix to the full URI prefix (you can see this in the context document near the top)

so for example the following are all supposed to be equivalent:

  • Public (when @context includes https://www.w3.org/ns/activitystreams)
  • as:Public (when @context includes "as": "https://www.w3.org/ns/activitystreams#")
  • https://www.w3.org/ns/activitystreams#Public (no @context needed)

the purpose of json-ld normalization is to convert all properties to a fully-qualified URI like https://www.w3.org/ns/activitystreams#Publicthis removes all ambiguity. a json-ld parser would not check for Public, it would check for https://www.w3.org/ns/activitystreams#Public in order to be absolutely sure we both mean “the activitystreams definition of public” and not “some other definition of public”.


as far as lemmy’s @context goes, i see the following issues:

  • "stickied": "as:stickied", implies that stickied exists in the as: namespace, but it does not
    • likewise for "moderators": "as:moderators"
  • "pt": "https://join-lemmy.org#", implies that the pt: namespace is owned by / expands to lemmy’s domain; i assume this is supposed to be peertube
  • "matrixUserId" has a nonsensical id; based on a sample user payload it seems like it just maps to a matrix identifier?

using JSON-LD Playground i came up with the following sample:

  "@context": [
  "https://www.w3.org/ns/activitystreams",
  {
    "pt": "https://joinpeertube.org/ns#",
    "lemmy": "https://join-lemmy.org/ns#",
    "sc": "http://schema.org/",
    "sensitive": "as:sensitive",
    "stickied": {
      "@type": "sc:Boolean",
      "@id": "lemmy:stickied"
    },
    "matrixUserId": {
      "@type": "sc:Text",
      "@id": "lemmy:matrixUserId"
    },
    "commentsEnabled": "pt:commentsEnabled",
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    }
  },
  "https://w3id.org/security/v1"
]

PR here: Fix: Use correctly parseable JSON-LD context by trwnh · Pull Request #2299 · LemmyNet/lemmy · GitHub

3 Likes

Thank for the pull request and explanation, it makes more sense to me now. If @Sebastian agrees with the context you proposed, I will merge it.

1 Like

Yep. Super thankyou to @trwnh - also for the quick PR.

sending an incorrect @context is actually worse than sending no @context at all

That is really the point.

I got the irony between the lines before, it is just that I will not give up the hope that other people are :wink:
Of course, I agree with the posted cause it is correct :slight_smile:

Personally, I find it a bit strange to describe datatypes with schema.org- all the AP specifications use the W3C Recommendation XML Schema: XML Schema Part 2: Datatypes Second Edition - so personally I would do:

"@context": [
  "https://www.w3.org/ns/activitystreams",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "sensitive": "as:sensitive",
    "stickied": {
      "@type": "xsd:boolean",
      "@id": "lemmy:stickied"
    },
    "matrixUserId": {
      "@type": "xsd:string",
      "@id": "lemmy:matrixUserId"
    },
    "commentsEnabled": "pt:commentsEnabled",
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    }
  },
  "https://w3id.org/security/v1"
]
2 Likes

I agree, that that is better. Likely that is chosen, because Peertube also uses schema.org data types in their @context.

I did not understand this. Currently all are possible, you could both provide a URN or use (de-referencable or not) URL’s. The https://www.w3.org/ns/activitystreams namespace itself is a de-referencable URL that returns different things based on content type: either the machine-readable JSON-LD with application/ld+json, or a nice summary of the standard if the content type is text/html.

that’s fair, i suppose it doesn’t really make a difference whether it’s sc:Boolean or xsd:boolean

this bit is technically not necessary because the activitystreams context document includes xsd

image

however (and here’s the point where it gets a bit pedantic) – xsd does not have a parseable json-ld context document. but schema.org does: Developers - schema.org

then again: https://www.w3.org/ns/json-ld allows for both prefixes

that’s part of it; the other part is what i mentioned above, in that schema.org is friendlier to json/ld than xmlschema (which is only defined in xml). but also as i said further above: it doesn’t really make a difference. both should be parsed as json true/false literals anyway. frankly, @type is not very important in parsing json-ld because the type system doesn’t really mean anything. it only matters for type coercion and i’m pretty sure that they coerce to the same thing regardless of which definition of boolean/Boolean you choose.


if you still think xsd is better than schema then please make arguments toward the former. i can see how xsd might be “better” because it is included “free” with the activitystreams namespace, but this might not be obvious to a human reading the document (in the same way that ldp and vcard are technically included, but not used – the activitystreams 2.0 spec merely states they SHOULD be used for extensions Activity Streams 2.0 ). i suppose i could ask someone more knowledgeable than me / someone with actual authority or expertise, but my (admittedly amateur) opinion is that it doesn’t matter based on prior usage and evidence – as:manuallyApprovesFollowers and toot:discoverable are both defined in the wild without a @type for what it’s worth.

1 Like

This is an area where I lack expertise, but some of my musings…

In the JSON-LD 1.1 Data Model I find:

A JSON-LD value is a typed value, a string (which is interpreted as a typed value with type xsd:string ), a number (numbers with a non-zero fractional part, i.e., the result of a modulo‑1 operation, or which are too large to represent as integers (see Data Round Tripping) in [JSON-LD11-API]), are interpreted as typed values with type xsd:double , all other numbers are interpreted as typed values with type xsd:integer ), true or false (which are interpreted as typed values with type xsd:boolean ), or a language-tagged string.

The xsd: datatypes are, I believe used in most W3C standards, including in the other linked data specifications, such as RDF. In the JSON-LD specification schema.org is used in various examples, but never to refer to primitive datatypes, but instead reserved for more semantically meaningful concepts.

Note that the JSON-LD document of schema.org is 1.4 MB large. Also I find its use as primitive datatype to be inconsistent, or rather having e.g. schema:Text means I might as well drop in a whole JSON object describing it.

In the Github repo for schemaorg there was a issue discussion about ‘reinventing’ new datatypes. There’s one recent comment measuring usages across the web, and this gist giving a more complete overview. XML Schema datatypes are much more common, except for modeling dates, where schema:Date etcetera is clearly most often used, but that there are a bunch of issues with it, as described in the issue.

At the end of the issue other incompatibilties are highlighted as well. The issue is closed… but automatically by a bot, not because it is resolved.

This leads me to have a preference to use XML Schema Datatypes.


Update:

Looking at who was involved with the definition of Schema.org - which is mostly Big Tech representatives - and looking through the JSON-LD context, where there’s a connection with various W3C ontologies, but zero reference to the xsd namespace that is declared in the top of the @context, I can’t help but think that breaking with the W3C primitive data model is somehow a deliberate move away from common best-practices that weaken open standards overall.

update: i asked christine webber (of activitypub fame) via dm and the answer i got was basically “it doesn’t matter because everyone is going to ignore it anyway”

to be clear: there are two different meanings for “type”. there is node type (things like Person, PropertyValue, and so on) and then there is value type (mostly primitives like string and boolean but also things like datetime).

node types are only rarely checked (for example, mastodon checks actor.attachment for type PropertyValue in order to construct its profile fields)

value types are as quoted below, indeed

this i think is more clear and i will change the PR to xsd:boolean – but to be fair, i could just as easily change it to no type at all (like how several properties have no full definition, just an alias), since the only @type that really signifies anything is @id (implying the value is a json-ld object). in the interest of correctness i will not do this.

[
  "https://www.w3.org/ns/activitystreams",
  "https://w3id.org/security/v1",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "sc": "http://schema.org/",
    "commentsEnabled": "pt:commentsEnabled",
    "sensitive": "as:sensitive",
    "matrixUserId": {
      "@type": "xsd:string",
      "@id": "lemmy:matrixUserId"
    },
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    },
    "stickied": {
      "@type": "xsd:boolean",
      "@id": "lemmy:stickied"
    }
  }
]

could just as easily be

[
  "https://www.w3.org/ns/activitystreams",
  "https://w3id.org/security/v1",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "sc": "http://schema.org/",
    "commentsEnabled": "pt:commentsEnabled",
    "matrixUserId": "lemmy:matrixUserId",
    "sensitive": "as:sensitive",
    "stickied": "lemmy:stickied",
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    }
  }
]
2 Likes