The road to ActivityPods 2.0

srosset81 · November 24, 2023, 2:15pm

Dear all,

We’re pleased to announce the publication of the first two posts of ActivityPods blog:

In addition, we’ve taken the time to detail our compliance with the ActivityPub spec, the Solid spec, as well as the particularities of ActivityPods.
These documents will be updated as we progress towards ActivityPods 2.0

We’d be delighted to receive any feedback, either on this forum or in GitHub issues.

Have a nice day !

trwnh · November 25, 2023, 6:24pm

re: creating custom collections…

In the case of a as:OrderedCollection, you must also indicate the apods:sortField and apods:sortOrder.

why is this a “must”? shouldn’t it be possible to create an OrderedCollection with a completely arbitrary ordering? of course, there is a tangential issue of how to manipulate that arbitrary order…

We have added a boolean apods:dereferenceItems in order to declare if the items should be dereferenced or not.

under which circumstances would you need this to be false?

For ordered collections, you should use the as:orderedItems predicate.

i’m not entirely familiar with the RDF implications here, but as:orderedItems doesn’t actually exist. it’s just a JSON-LD alias for as:items being a @list rather than a @set.

stevebate · November 25, 2023, 7:42pm

The apods:dereferenceItems seems to be a serialization directive. It doesn’t mean anything related to storing RDF data and it’s not clear what it would mean for storing some derivation of plain JSON data. Maybe it’s ok to store the default serialization behavior in the collection information, but I’d like to see this as a general HTTP query option for the serialization of any dereferenced object, not just collections (https://server.example/someobject?embed=foo,bar),

As for when it would be false. I can think of examples where one might want that. A collection of collections, for example.

That’s a good point. If an application is expanding AP JSON-LD for ingest to an RDF store there will be no orderedItems predicate in the expanded data. Someone can still insert an as:orderedItems in an RDF graph but it does seem like that’s going to make JSON-LD interop more complicated. I’d think one would want to use as:items for both on the RDF side. The ordering vs nonordering can be determined from the RDF data.

srosset81 · November 27, 2023, 10:08am

IMO if you have an ordered collection, but it can have a completely arbitrary ordering, it’s not an ordered collection anymore We could set as:published as the default sort field, but there are cases where it may not be defined.

The followers collection, for example, doesn’t dereference its items. It is a list of actors URIs. On the other hand, the inbox collection dereference activities. If you want to create completely custom collections, it seems important to indicate this information.

Thanks for the information, I didn’t know that.

Persisting order in RDF graphs is unfortunately complicated. See for example this article. For ActivityPods/SemApps, we chose to store OrderedCollections items without order. When we GET the collection, we do a SPARQL query with a sort directive .

Luckily we can also look at the @type: Collection or OrderedCollection

stevebate · November 27, 2023, 11:17am

Yes, I’m aware of the complications. I can understand why you might ignore the JSON-LD @list directive but that’s even more reason to use as:items instead of the non-standard (in an RDF context) as:orderedItems.

I think you’re doing very interesting work. However, storing collections with unordered items (only sorted at query-time) as OrderedCollections in the RDF graph is surprising. I’d think you’d store them as Collections and serialize the items as an OrderedCollection/Page depending on the query sort criteria, if any. I suppose the @type is being used here as a (somewhat redundant) serialization hint, like the sortField and sortOrder predicates?

stevebate · November 27, 2023, 11:17am

Are the developer installation instructions in the next branch accurate for ActivityPods 2.0 development?

srosset81 · November 27, 2023, 2:01pm

You could also see this as an implementation detail, since ActivityPub-compatible applications shouldn’t care about the way data is stored in other apps But since we also offer a SPARQL endpoint, it indeed creates confusion for those who would query data through SPARQL queries.

We haven’t refactored this mainly due to lack of time/funding on SemApps core. But I’m also wondering: if, to find the 1000th activity in an inbox, I need to recursively go through the first 999 activities, won’t the performances be aweful ?

Yes. Except the proposed collection API is not implemented yet so, in the current code, we use some internal settings for each collection.

The code is on the external-apps branch. But ActivityPods 2.0 is still in early development stage, several pieces are still missing so not much will be working at the moment.

For the next and master branches, instructions should be up-to-date, but feel free to create issues if you run into troubles.

stevebate · November 27, 2023, 2:43pm

Yes, the SPARQL support was one reason for my comments. I was also thinking about the data sharing aspect that’s a possibility with a Solid Pod storing the AP instance data. Will the AP-related data be available for mixing into other applications that don’t use the AP protocol? For example, will a community-curated (via AP) resource directory be available to a SOLID client (browser-based application)?

I don’t know. It might be. I’m not questioning storing all collections as unordered sets as a potential performance optimization. But using the OrderedCollection type for the unordered collection seemed strange.

And, to be clear, I like the idea of specifying the order at query time. I like it so much that I don’t think there should even be sortField and sortOrder predicates in the stored collections. When the query results are serialized, an OrderedCollection can be used if the query uses sorting. The sortField and sortOrder predicates could be added to the serialized result if needed.

This is a similar concept to Mastodon’s “presenters”, which are serializers between the storage representation and the published representation of the data.

For reverse-chrono RDF lists, the performance is probably ok since you are initially querying from the front of the list. Paging can maintain references to nodes within the list to be more efficient. For other list orderings, it depends on the access pattern. If you really need activity 1000 of a list for some reason (random access, without paging), that might be an issue. I’d think that typically the activity could be found directly in the graph using a query.

Thanks for the info. Will you post an update here when the ActivityPods 2.0 code is at a “developer preview / alpha” stage?

melvincarvalho · November 27, 2023, 6:49pm

Cool stuff! I’ll see if i can create modules for your version of solid and add them to solid-lite