LinkML for definition of ActivityPub extension schema's

aschrijver · January 11, 2024, 7:08pm

I bumped into this site again, and thought I should mention it here:

Some highlights from the Overview page:

Easy to author schemas
classes:
  Person:
    is_a: NamedThing  ## parent class, defines id, name, ...
    description: >-
      A person (alive, dead, undead, or fictional).
    class_uri: schema:Person
    mixins:
      - HasAliases
    slots:
      - primary_email
      - birth_date
      - age_in_years
      - gender
      - current_address
      - has_employment_history
      - has_familial_relationships
      - has_medical_history
...
Rich modeling language

LinkML offers many features of use to data modelers, while retaining a simple core

Classes can be arranged in inheritance hierarchies

Powerful yet easy to use Semantic enumerations that can optionally be backed by ontologies

Create data models that are independent of a database technology

Develop machine-actionable reporting standards and data dictionaries

Include rich annotations and mappings as part of a model

“Linked Data” ready

All schemas have a corresponding JSON-LD context

Compatibility with RDF tooling, without committing to an RDF stack

Compilation to SHACL and ShEx

Export of data models to OWL Schemas

Bridge between frameworks

LinkML has many different generators for existing frameworks that allow the translation of a LinkML schema to other frameworks:

Convert to JSON-LD contexts, and instantly port your data to RDF

Convert to JSON-Schema and using JSON-Schema validators

Convert to SHACL or ShEx and validate your RDF data

Convert to Python dataclasses or pydantic for easy use within applications

Generate SQL Schemas or SQL Alchemy for use with relational databases

codenamedmitri · January 11, 2024, 7:23pm

Oooh, yeah, LinkML is great! (We use it in various places, including the Verifiable Credentials WG).

aschrijver · January 11, 2024, 7:26pm

Nice! I also cross-linked to ActivityPub “Step On Board” Integration Guide, as it may be very helpful for the documentation of AP extensions. If this can help bridge the divide between JSON-only and LD-based AP app developers…

PS. Link to toot about the subject.

helge · January 13, 2024, 7:55am

I think that the entire Linked Data discussions currently miss the point. I didn’t spend too much time looking at LinkML, but it also seems to shares the same weakness that JsonLD has:

I cannot express that this property has a single value.

So I will need to rely on auxiliary documentation to enforce the simple fact that an actor profile in ActivityPub should contain a single inbox element.

The goal of any change to how we serialize ActivitiyPub should be foremost to ensure that all the simple stuff is automatically satisfied, so developers can worry about the important stuff.

aschrijver · January 13, 2024, 8:11am

Slots

Slots operate the same way as “fields” in traditional object languages and the same ways as “columns” in spreadsheets and relational databases.

If you have a JSON object that is conformant to a LinkML schema, then the keys for that object must correspond to slots in the schema, that are applicable to that class.

For example, if we have an object instantiating a Person class:
{"id": "PERSON001",
 "name": "....",
 "email": "....",
 ...
}
then id, email, name should all be valid slots, as in the following schema:
classes:
  Person:
    slots:
      - id
      - name
      - email
If we have tabular data

id name email

PERSON0001 … …

then the same constraints hold.

Slot cardinality

The cardinality of a slot is indicative of two properties on a slot. It tells us about whether a slot is required or not, and also about how many values it is allowed to have, i.e., whether it is single-valued or multi-valued.

The following list summarizes the expansions of the various possible combinations of cardinalities that can be asserted on a slot:

1..* - slot is required and multivalued

1..1 - slot is required but not multivalued

0..* - slot is not required but if provided it must be mulitvalued

0..1 - slot is not required and not multivalued

helge · January 13, 2024, 12:03pm

See also: Dr. jonny phd: "i'll say more about what this is in the morning, …" - Neuromatch Social and

stevebate · January 13, 2024, 1:48pm

Very cool. There’s now one less reason to switch to a JSON-first view of the Recommendations.

I generally like the LinkML representation more than the corresponding JSON Schema. I’m not sure it can represent the full AS2 semantics/schema though (see IntransitiveActivity, for example), but that’s probably ok. It could still be useful for representing AP/AS2 specializations for domain-specific profiles (which will generally only use a small subset of the full AS2 flexibility).

I’m going to run a LinkML-based validator (based on the model in the repo) against the AS2 test corpus to see what happens.

Also note that, despite the file name, the repo document is also not for ActivityPub (but maybe because it’s a WIP?). It appears to be generated from the original AS2 OWL document, which is AS2-only. For example, there’s currently no inbox, outbox, endpoints, etc., defined in the LinkML schema.

codenamedmitri · January 13, 2024, 9:53pm

So that’s the cool thing about LinkML! It can express “single value”, and it compiles to JSON Schema (as well as other things).

jonny · January 14, 2024, 1:24am

Yes its a WIP. I should have named that activitystreams because im going to define the activitypub definitions in that second, empty schema that imports definitions from the activitystreams schema. Initially I was going to do it all in one schema, but I figured it would be useful to have both seperable near the end.

helge · January 14, 2024, 8:39am

@aschrijver already pointed this out to me.

I’m wondering a bit what the envisioned work flow here is. I would suspect it’s something like:

Write LinkML data model
Export to JsonSchema
Use JsonSchema → Data Objects in your preferred programming language

The advantage over using JsonSchema directly would be that you get something like the @context.

The big problem I see here is the usual with JsonLD:

If I change I perform say JsonLD.expansion against the serialized Data Object, will my auto generated parser still be able to parse it?

If not, I think one is better off just ignoring JsonLD and declaring the exported JsonSchema to be the data format.

stevebate · January 14, 2024, 12:01pm

That would make sense given JSON Schema is a data format language (schema constraint language) and JSON-LD is not that. For AS2, the normative data format is defined using natural language in the AS2 Recommendation documents. The AS2 JSON-LD context only defines terms. However, I’d prefer using LinkML directly for my purposes (validation, maybe code generation).

jonny · January 14, 2024, 11:24pm

The main thing one gets here are constraints and data models - so the owl model can be used for validation if it were able to be dereferenced from the json schema context, or one could load a given data object and validate it with linkml tools directly. For handling different forms of json-ld (eg. The compacted/expanded form as you mention) you would still probably want to use something like rdflib that can parse and normalize it.

Then on the data model side, it becomes a toolkit for building activitypub programs - you get SQL schema models and pydantic classes than are (more or less) validating. Eg. I am writing a FastAPI activitypub implementation along the lines of @datatitian 's activitypub-express, ( GitHub - p2p-ld/fastapi-activitypub: Ultraminimal Activitypub SDK for FastAPI still just a stub, almost no work done) and I was like “why am I defining all these models locally, I should just be able to import the models from a shared repo” which is what led me to make this. Then, since one can customize the linkml generation process, I can make modified versions of the data models from the schema, so I can for example supplement them with the LD annotations to be able to use them in a proper RDF database ( GitHub - p2p-ld/pydantigraph: ORM data models, schemas, and vocabularies for pyoxigraph ) so we can start getting a collection of different model forms that can be eg. Used with sqlalchemy or whatever other tool you want and make implementing AP easier and less error prone.

Probably most fundamentally is “there isnt a computer readable format of AS/AP spec” so this is that - what objects are there again? What properties do they have? that should go a long way towards validation and interop for new projects, and as the title of this thread would suggest, the reason im building a graph db based AP implementation will be to be able to support schema extensions - something new comes along, it can tell you its a subtype of Action and fit in your database without you already needing a table for it.

Ofc that doesnt do the actual “activity” logic, but I think having the models separate from the logic is a virtue in this case - a nice point for information hiding so the models dont get too enmeshed in any given language or framework. But if we can figure out how to schematize the actions and side effects as well then boom thats the whole show.

The trick is gonna be the “dereference to ontology” part, and that’ll be the next few years for me as I dig LD out of its IRI rut and make it possible to make assertions about things without needing to be located at that thing in a p2p context