We have created documentation on how exactly Lemmy federation works

nutomic · June 6, 2022, 12:34pm

Thank for the pull request and explanation, it makes more sense to me now. If @Sebastian agrees with the context you proposed, I will merge it.

Sebastian · June 6, 2022, 3:00pm

Yep. Super thankyou to @trwnh - also for the quick PR.

sending an incorrect @context is actually worse than sending no @context at all

That is really the point.

I got the irony between the lines before, it is just that I will not give up the hope that other people are
Of course, I agree with the posted cause it is correct

Personally, I find it a bit strange to describe datatypes with schema.org- all the AP specifications use the W3C Recommendation XML Schema: XML Schema Part 2: Datatypes Second Edition - so personally I would do:

"@context": [
  "https://www.w3.org/ns/activitystreams",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "sensitive": "as:sensitive",
    "stickied": {
      "@type": "xsd:boolean",
      "@id": "lemmy:stickied"
    },
    "matrixUserId": {
      "@type": "xsd:string",
      "@id": "lemmy:matrixUserId"
    },
    "commentsEnabled": "pt:commentsEnabled",
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    }
  },
  "https://w3id.org/security/v1"
]

aschrijver · June 6, 2022, 3:14pm

I agree, that that is better. Likely that is chosen, because Peertube also uses schema.org data types in their @context.

I did not understand this. Currently all are possible, you could both provide a URN or use (de-referencable or not) URL’s. The https://www.w3.org/ns/activitystreams namespace itself is a de-referencable URL that returns different things based on content type: either the machine-readable JSON-LD with application/ld+json, or a nice summary of the standard if the content type is text/html.

trwnh · June 6, 2022, 5:17pm

that’s fair, i suppose it doesn’t really make a difference whether it’s sc:Boolean or xsd:boolean

this bit is technically not necessary because the activitystreams context document includes xsd

however (and here’s the point where it gets a bit pedantic) – xsd does not have a parseable json-ld context document. but schema.org does: Developers - Schema.org

then again: The JSON-LD Vocabulary allows for both prefixes

that’s part of it; the other part is what i mentioned above, in that schema.org is friendlier to json/ld than xmlschema (which is only defined in xml). but also as i said further above: it doesn’t really make a difference. both should be parsed as json true/false literals anyway. frankly, @type is not very important in parsing json-ld because the type system doesn’t really mean anything. it only matters for type coercion and i’m pretty sure that they coerce to the same thing regardless of which definition of boolean/Boolean you choose.

if you still think xsd is better than schema then please make arguments toward the former. i can see how xsd might be “better” because it is included “free” with the activitystreams namespace, but this might not be obvious to a human reading the document (in the same way that ldp and vcard are technically included, but not used – the activitystreams 2.0 spec merely states they SHOULD be used for extensions Activity Streams 2.0 ). i suppose i could ask someone more knowledgeable than me / someone with actual authority or expertise, but my (admittedly amateur) opinion is that it doesn’t matter based on prior usage and evidence – as:manuallyApprovesFollowers and toot:discoverable are both defined in the wild without a @type for what it’s worth.

aschrijver · June 6, 2022, 7:15pm

This is an area where I lack expertise, but some of my musings…

In the JSON-LD 1.1 Data Model I find:

A JSON-LD value is a typed value, a string (which is interpreted as a typed value with type xsd:string ), a number (numbers with a non-zero fractional part, i.e., the result of a modulo‑1 operation, or which are too large to represent as integers (see Data Round Tripping) in [JSON-LD11-API]), are interpreted as typed values with type xsd:double , all other numbers are interpreted as typed values with type xsd:integer ), true or false (which are interpreted as typed values with type xsd:boolean ), or a language-tagged string.

The xsd: datatypes are, I believe used in most W3C standards, including in the other linked data specifications, such as RDF. In the JSON-LD specification schema.org is used in various examples, but never to refer to primitive datatypes, but instead reserved for more semantically meaningful concepts.

Note that the JSON-LD document of schema.org is 1.4 MB large. Also I find its use as primitive datatype to be inconsistent, or rather having e.g. schema:Text means I might as well drop in a whole JSON object describing it.

In the Github repo for schemaorg there was a issue discussion about ‘reinventing’ new datatypes. There’s one recent comment measuring usages across the web, and this gist giving a more complete overview. XML Schema datatypes are much more common, except for modeling dates, where schema:Date etcetera is clearly most often used, but that there are a bunch of issues with it, as described in the issue.

At the end of the issue other incompatibilties are highlighted as well. The issue is closed… but automatically by a bot, not because it is resolved.

This leads me to have a preference to use XML Schema Datatypes.

Update:

Looking at who was involved with the definition of Schema.org - which is mostly Big Tech representatives - and looking through the JSON-LD context, where there’s a connection with various W3C ontologies, but zero reference to the xsd namespace that is declared in the top of the @context, I can’t help but think that breaking with the W3C primitive data model is somehow a deliberate move away from common best-practices that weaken open standards overall.

trwnh · June 6, 2022, 8:18pm

update: i asked christine webber (of activitypub fame) via dm and the answer i got was basically “it doesn’t matter because everyone is going to ignore it anyway”

to be clear: there are two different meanings for “type”. there is node type (things like Person, PropertyValue, and so on) and then there is value type (mostly primitives like string and boolean but also things like datetime).

node types are only rarely checked (for example, mastodon checks actor.attachment for type PropertyValue in order to construct its profile fields)

value types are as quoted below, indeed

this i think is more clear and i will change the PR to xsd:boolean – but to be fair, i could just as easily change it to no type at all (like how several properties have no full definition, just an alias), since the only @type that really signifies anything is @id (implying the value is a json-ld object). in the interest of correctness i will not do this.

[
  "https://www.w3.org/ns/activitystreams",
  "https://w3id.org/security/v1",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "sc": "http://schema.org/",
    "commentsEnabled": "pt:commentsEnabled",
    "sensitive": "as:sensitive",
    "matrixUserId": {
      "@type": "xsd:string",
      "@id": "lemmy:matrixUserId"
    },
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    },
    "stickied": {
      "@type": "xsd:boolean",
      "@id": "lemmy:stickied"
    }
  }
]

could just as easily be

[
  "https://www.w3.org/ns/activitystreams",
  "https://w3id.org/security/v1",
  {
    "lemmy": "https://join-lemmy.org/ns#",
    "pt": "https://joinpeertube.org/ns#",
    "sc": "http://schema.org/",
    "commentsEnabled": "pt:commentsEnabled",
    "matrixUserId": "lemmy:matrixUserId",
    "sensitive": "as:sensitive",
    "stickied": "lemmy:stickied",
    "moderators": {
      "@type": "@id",
      "@id": "lemmy:moderators"
    }
  }
]

nutomic · June 6, 2022, 10:27pm

Would it be possible to move the whole context into a separate local file? Then we would only have to set "@context": "https://example.com/context" in each file, and not include that long context everywhere.

Also, there are some additional nonstandard fields used by Lemmy, for example on Block activities. If you want to include those, I can look through the code for any other nonstandard fields.

mro · June 7, 2022, 6:39am

I guess @Sebastian means ‘it’s just an URI (it needs not be an URL)’ . So it may well be one but not necessarily be dereferencable.

trwnh · June 7, 2022, 10:02am

Yes, it is possible to use a context file instead of embedding the context directly. Pleroma does this:

"@context": [
  "https://www.w3.org/ns/activitystreams",
  "https://letsalllovela.in/schemas/litepub-0.1.jsonld",
  {
    "@language":"und"
  }
]

The context document needs to be available when making a GET request with Accept: application/ld+json at minimum, and respond with content-type: application/ld+json.

If this option is taken then I can help you test for validity using a Python script and/or the JSON-LD playground to ensure that the context document is being fetched correctly.

Those should be included if you want JSON-LD implementations to be aware of those properties. If you don’t include them (as is done currently), then they will simply be stripped/ignored. It would probably be best to find any nonstandard fields and make a list describing what their intended purpose is

mro · June 7, 2022, 9:50pm

how would one query e.g. the amount of likes on a post?

nutomic · June 8, 2022, 10:21am

Good question. For now we only federate likes as activities, but there is no endpoint that could be queried. We could do it similarly to group followers, which are exposed as a collection with totalItems but empty items for privacy reasons (and its easier to implement). If you want something like that, please open an issue.

https://join-lemmy.org/docs/en/federation/lemmy_protocol.html#community-followers

mro · June 8, 2022, 12:08pm

how would one test (automated) if the result cannot be queried? I think of HTTP GET called by e.g. https://seppo.social/demo/seppo.cgi/ status page – updown.io and return yes or no.

At first Like, next Note, Boost, Reply, Follow, Unfollow . No sure if I need them all outgoing.

nutomic · June 8, 2022, 12:51pm

I dont think there is any general solution for this, because Activitypub allows for many different ways to implement the same functionality. So each project might expose this data slightly differently, and you have to look at them one by one.

In case of Lemmy, Like/Dislike are not listed publicly, but that could be added. Announce can be fetched from https://lemmy.ml/c/activitypub/outbox, or received when you follow a Group. Note could be exposed as part of a collection on Page, but so far its only sent as activity. Follow and Unfollow are private. Dont know what you mean by Reply.

mro · June 8, 2022, 1:05pm

I was prepared for this. Getting the information at all however is inevitable. @chocobozzz has a collection at <post_id>/likes advertised in the actor profile. Such would be ideal.

chocobozzz · June 8, 2022, 1:41pm

Hi,

As a side note, we plan to remove likes/dislikes details in the future to improve users privacy. I think we’ll just keep the totalItems counter, without providing rate URLs.

mro · June 8, 2022, 3:06pm

the total counter is perfect, I just need the measurable effect before and after Dislike/Like.

Sebastian · June 14, 2022, 11:02pm

addendum:
The namespace not ending in a “#” is probably an error of the spec.
Just found an issue Incorrect JSON-LD: Missing '#' from the end of @vocab param in Core Example 2 · Issue #510 · w3c/activitystreams · GitHub

nutomic · June 17, 2022, 1:29pm

@mro Feel free to make a pull request to add such a collection to Lemmy. I can help you with that, but dont have any time to implement it myself.

@Sebastian To be clear, the correct one is https://www.w3.org/ns/activitystreams#, and without # is incorrect?

mro · June 17, 2022, 8:33pm

https://www.w3.org/ns/activitystreams#h-introduction says so.

mro · June 17, 2022, 8:38pm

… and https://www.w3.org/TR/activitystreams-core/#extensibility fucks it up immediately.