FEP-5bf0: Collection sorting and filtering

aschrijver · April 10, 2023, 3:49pm

Hello!

This is a discussion thread for the proposed FEP-5bf0: Collection sorting and filtering. Please use this thread to discuss the proposed FEP and any potential problems or improvements that can be addressed.

Summary

This proposal would allow Collections to have a streams property, as Actors do. The streams would be additional Collections that are sorted or filtered versions of the original Collection.

The property that determines the sort order and/or the filtered property should be made available, possibly via the context property. The examples below use Schema.org’s PropertyValue for lack of a better vocabulary.

Sorted Collections would always be of the type OrderedCollection. The current property would indicate a sorted Collection with a reverse-ordered sort.

trwnh · April 10, 2023, 4:26pm

probably not; context is intended for logically grouping related activities and objects. perhaps an extension orderedBy?

ActivityPub mandates that OrderedCollection MUST be reverse-chronological. it is, however, still possible to use orderedItems on any Collection.

this seems like a violation of the current definition, which stipulates that the value is either a CollectionPage or a Link. it is semantically like indicating the “latest” items, and does not imply anything about reversed ordering. if it is important to have a reversed sort, then perhaps a reversed or reverseOrder extension property would do better? this would be either part of or in combination with the orderedBy extension proposed above.

i suppose there’s no harm in this… seeing as the definition of streams is “a list of supplemental Collections which may be of interest”, the usage proposed by this FEP seems to fit within that scope. although, i do wonder if it wouldn’t be better placed on the actor’s streams itself rather than on the inbox.streams.

the way this vocabulary is being used is not very semantic… again, if the intent is to have filteredByProperty, then it would make more sense to flesh out signals for the “order” in orderedItems. see discussion of orderedBy above.

mpuckett · April 10, 2023, 8:23pm

Hi @trwnh! Author here. This is wonderful feedback! Thank you so much!

Great to hear!

As far as the example, I agree that it would make more sense to have “filtered” Inbox Likes and Inbox Notes Collections as streams on the Actor.

However, there may be Collections that do not have an associated Actor (such as global Hashtags) so this would handle those cases.

Regarding:

and

It sounds like the suggestion here is to add a new vocabulary term as an extension, instead of overloading context. I think that would be a great idea. If a server isn’t aware of the term, there is no harm, as it is just additional information.

Ah, if that is the case, then that language should be removed.

Similarly, the language about current should be removed, as you pointed out that:

And overloading current for these cases would be confusing.

I agree this would make sense as another vocabulary extension.

So ultimately the proposal should reflect the following changes:

Make streams property allowed on Collections/OrderedCollections
Add orderedBy property as an extension on Collections
Add filteredBy property as an extension on Collections/OrderedCollections
Add reversed property as an extension on Collections

It sounds like the proposal should also indicate these implementation details:

Sorted collections should have a type of Collection
Sorted collections should use orderedItems instead of items
Sorted collection should include { orderedBy: 'keyName' }
Sorted collections that are in reverse order should get { reversed: true }
Filtered collections should include { filteredBy: 'keyName' }

Again, I really appreciate your feedback. Please let me know if I misunderstood anything.

I will try to go ahead and modify the proposal via Git.

trwnh · April 11, 2023, 7:40am

Hashtags (as:Hashtag as collectively falsely defined) are generally not collections, though – objects may be loosely grouped by such tags, but not much more. context is intended for stronger groupings, i.e. those that have a purpose (although at its weakest, it might be just a tag that isn’t meant to resolve)

mpuckett:

So ultimately the proposal should reflect the following changes:

Make streams property allowed on Collections/OrderedCollections

Add orderedBy property as an extension on Collections

Add filteredBy property as an extension on Collections/OrderedCollections

Add reversed property as an extension on Collections

It sounds like the proposal should also indicate these implementation details:

Sorted collections should have a type of Collection

Sorted collections should use orderedItems instead of items

Sorted collection should include { orderedBy: 'keyName' }

Sorted collections that are in reverse order should get { reversed: true }

Filtered collections should include { filteredBy: 'keyName' }

Again, I really appreciate your feedback. Please let me know if I misunderstood anything.

after having more time to think about this, i have the following thoughts:

ItemList - Schema.org Type and ListItem - Schema.org Type may be used as prior art for inspiration. in those, ListItem.position acts as an index, and ItemList.itemListOrder acts as the type of ordering – itemListOrderAscending, itemListOrderDescending, and itemListUnordered. activitystreams vocabulary has a distinction between items and orderedItems, so this isn’t a 1:1 mapping, but…
in thinking about this, it occurs to me that this is quite similar to SQL’s ORDER BY instruction… and so the realization hits. isn’t this use-case best handled by a query language?

the rough implementation model might look something like this:

on the activitypub Server, we have a Collection with orderedItems.
an activitypub Client (or some other similarly positioned software) sits on top and provides a query language interface

when querying the latter software, it may or may not be desirable to return activitystreams Collections, or some other schema.

so we can reduce the problem statement as thus:

in the case where an activitystreams Collection is returned, you may want to hint what the ordering of orderedItems is.

useful hints may include:

ordering by one or more keys
whether the ordering is Ascending (lowest index first) or Descending (highest index first) – reversed doesn’t really indicate what the “default” ordering is.
whether the current Collection is a subset of another Collection?

it might even be useful to directly hint the SQL (or other such query) that generated the filtered collection, if available? perhaps a synthesis of the above…

a property to hint the source collection (analogous to FROM) – if we use CollectionPage, we can reuse partOf. if we use a non-paged Collection, we need something new.
a property to hint the filtering criteria (analogous to WHERE)
a property to hint the ordering criteria (analogous to ORDER BY)

using SQL

this might look something like this:

id: https://activitypub.example.com/some-collection/page/1
type: CollectionPage
partOf: https://activitypub.example.com/some-collection
totalItems: ...
orderedItems:
  - https://activitypub.example.com/object1
  - https://activitypub.example.com/object2
  - https://activitypub.example.com/object3
  - ...
ex:orderedBy: "ORDER BY id ASC, published DESC"
ex:filteredBy: "WHERE type='Like'"

activitypub Clients or extended Servers with enhanced paging support can generate such pre-filtered/sorted Collections and advertise them via streams as you proposed? i’m thinking that framing this as “enhanced paging support” might be a more successful idea

a note about SPARQL

it also occurs to me that there exists SPARQL for more powerful querying of linked data and RDF triples… we can note that the above SQL has ambiguity throughout. we have no way of knowing that when we say published, we mean https://www.w3.org/ns/activitystreams#published (the datetime) and not some other vocabulary or ontology’s meaning of “published” (perhaps some boolean). the equivalent SPARQL query might look something like this:

PREFIX as: <https://www.w3.org/ns/activitystreams#>
SELECT ?activity
FROM <https://example.org/some-collection>
WHERE
  {
    ?activity as:type as:Like ;
      as:published ?published .
  }
ORDER BY ?activity DESC(?published)

disclaimer: i am not familiar with SPARQL, so the above example may not be syntactically valid. the purpose of the example is to illustrate SPARQL’s disambiguation features via PREFIX, and its more powerful approach to structured linked data.

there are of course two alternatives here, if SPARQL is to be avoided for some reason:

assume that any property in orderedBy and filteredBy exactly matches the compacted form of the current document. as activitystreams vocabulary can generally be expected to not be overridden, this makes it usage relatively safe; however, extension terms will require special handling
always use the fully expanded form of term names, with the caveat that such term names will have to be extracted from the string value and then compacted against the current @context?

mpuckett · April 21, 2023, 2:04am

Thank you so much for the insight and suggestions!

I think you’re right that in order to accurately describe how a Collection has been filtered, you might need to expose the SQL/SPARQL statement and/or additional processing that’s been done after querying. However, at that point, it would require that the client is able to process such instructions, so it becomes less of a client “hint” and probably too demanding.

Here I think integrating the SHACL W3C standard could provide a solution.

First, a simple example. Below is a “fep:CollectionViewPage” with some basic filtering and sorting applied via “fep:filter” and “fep:sort”. This is a View of Alice’s Outbox. In this case, Alice’s full Outbox contains types of Activities other than “as:Create”, such as “as:Accept” and “as:Delete”.

Filtering is indicated by a SHACL “sh:NodeShape” with a constraint on a top-level property, where the “@type” is “as:Create”.

For sorting, SHACL’s “sh:path” indicates that the property being sorted is “as:published” and “fep:order” indicates the sort direction.

{
  "name": "Recent Outbox Posts by Alice",
  "type": "CollectionView",
  "filter": {
    "type": "NodeShape",
    "property": [{
      "type": "PropertyShape",
      "path": "type",
      "hasValue": "Create"
    }]
  },
  "sort": {
    "type": "SortRule",
    "rule": {
      "type": "PropertyShape",
      "path": "published"
    },
    "order": "Ascending"
  }
}

Below is a more complex example.

This is a CollectionView of Alice’s Inbox which contains only Blog Posts by her Co-workers. (Alice has a custom stream of mutual friends who she has labeled as Co-workers.)

The first filter is the same.

SHACL’s “sh:path” can be an array, so the second filter is on a nested property, in this case “object.type” matching on “as:Article”.

The third filter is a custom SHACL Shape “fep:isInCollection” which gets provided the URL for the Co-workers Collection as an argument. The underlying SHACL mechanics are basically that this is referencing a JavaScript function that is supposed to perform the check, but this does necessarily need to be implemented. For “client hint” purposes, this is just a way to provide arguments associated with predetermined domain-specific terms that are not easily represented using the rest of the SHACL vocabulary.

{
  "name": "Blog Posts by Alice's Co-workers",
  "type": "CollectionView",
  "filter": {
    "type": "NodeShape",
    "property": [{
      "type": "PropertyShape",
      "path": "type",
      "hasValue": "Create"
    }, {
      "type": "PropertyShape",
      "path": ["object", "type"],
      "hasValue": "Article"
    }, {
      "type": "PropertyShape",
      "path": "actor",
      "isInCollection": {
        "collectionUrl": "https://example.social/@alice/following/coworkers"
      }
    }]
  },
  "sort": [{
    "type": "SortRule",
    "rule": {
      "type": "PropertyShape",
      "path": "published"
    },
    "order": "Ascending"
  }]
}

Here’s what the “@context” definition might look like for FEP with these additions:

{
    "@context": {
      "@version": 1.1,
      "@vocab": ":_",
      "fep": "https://w3id.org/fep#",
      "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "sh": "http://www.w3.org/ns/shacl#",
      "as": "https://www.w3.org/ns/activitystreams#"
    },
    "@graph": [
      {
        "@id": "fep:CollectionView",
        "@type": "rdfs:Class",
        "rdfs:subClassOf": "as:OrderedCollection"
      },
      {
        "@id": "fep:CollectionViewPage",
        "@type": "rdfs:Class",
        "rdfs:subClassOf": "as:OrderedCollectionPage"
      },
      {
        "@id": "fep:filter",
        "@type": "rdf:Property",
        "rdfs:domain": "fep:CollectionViewPage",
        "rdfs:range": "sh:NodeShape"
      },
      {
        "@id": "fep:sort",
        "@type": "rdf:Property",
        "rdfs:domain": "fep:CollectionViewPage",
        "rdfs:range": "fep:SortRule"
      },
      {
        "@id": "fep:SortRule",
        "@type": "rdfs:Class"
      },
      {
        "@id": "fep:rule",
        "@type": "rdf:Property",
        "rdfs:domain": "fep:SortRule",
        "rdfs:range": "sh:PropertyShape"
      },
      {
        "@id": "fep:order",
        "@type": "rdf:Property",
        "rdfs:domain": "fep:SortRule",
        "rdfs:range": "fep:SortOrder"
      },
      {
        "@id": "fep:SortOrder",
        "@type": "rdfs:Class"
      },
      {
        "@id": "fep:Ascending",
        "@type": "fep:SortOrder"
      },
      {
        "@id": "fep:Descending",
        "@type": "fep:SortOrder"
      }
    ]
  }

Please let me know what you think of this idea.

mpuckett · April 21, 2023, 8:46pm

Made a few tweaks from the above. The PR I made is here:

trwnh · April 22, 2023, 8:09am

this is a lot! i’ll have to think about this, and read more about SHACL. but in the meantime, i can make the following comments:

can you explain why you went with CollectionView instead of reusing CollectionPage? which properties can you expect the former to have that the latter will not?
isInCollection seems a bit weird. why does it nest collectionUrl instead of referring to it directly? why not partOf as defined on CollectionPage?
the @context will need some work, but we can deal with that later

specific points:

i’m not sure we “need” to expose the query, but it should probably be expressed somehow. we’re not expecting the client to perform any queries; these are pregenerated and preprocessed by the server, right?

we might use summary to express in natural language something like “Like activities, most recent first.” as this will be human-useful. but for machines, it just needs to be something parseable. as toyed with above:

ex:orderedBy: "ORDER BY id ASC, published DESC"
ex:filteredBy: "WHERE type='Like'"

in this example, we have… better than nothing, i guess? it’s sort of readable. gets enough of the point across. but we can probably do better. we might define an SqlQuery as the generator, but this an idea i haven’t really thought out.

likely i will need to read more about SHACL before making any more comments or recommendations on this point.

it seems better to stick with filteredBy and sortedBy, as these are less ambiguous to their meaning.

from my preliminary reading, this seems wrong; @type contains @id nodes, not @value nodes. unless the SHACL spec means something else by “value node”… but it also seems that there might be a better fit in sh:class or sh:ClassConstraintComponent.

there doesn’t seem to be anything disambiguating or providing context that "path": "published" is referring specifically to https://www.w3.org/ns/activitystreams#published. is path defined in some SHACL @context somewhere as @type: @id? and if so, then where is this context definition provided?

this likewise seems like it needs disambiguation. is path defined as @container: @list? if not, then that array is an unordered set. likewise, it probably needs a @type: @id or else object and type are ambiguous. and again, hasValue seems like it might potentially be inappropriate compared to something else like sh:class or sh:ClassConstraintComponent.

i’ll try to read more about SHACL and find out if there is a normative context for it, so that we can address the issues above.

bobwyman · June 28, 2023, 4:49pm

Please note that in today’s Issue Triage, a proposal was made to add an Errata item which clarifies that the reverse-chronological requirement should only apply to several OrderedCollections defined by ActivityPub and need not apply to OrderedCollections defined in extensions. Essentially this means that “An OrderedCollection must be reverse-chronological” should have been written as “These OrderedCollections must be reverse-chronological.”

eprodrom · November 15, 2023, 5:15pm

I like this FEP. However, it doesn’t have a reference to SHACL, which makes reading the document kind of confusing.

https://www.w3.org/TR/shacl/

eprodrom · November 15, 2023, 7:09pm

I have a few notes:

I don’t think an additional type is necessary. A subset of a Collection is still a Collection.
It would be really useful to use the sort property to define how an OrderedCollection is ordered, for example.
It would be useful to identify what the filtered results came from. origin might be good for this.
The streams property is usually part of an actor, not part of a Collection.

So, I’d suggest a slightly modified example:

{
   "@context": { ... },
   "type": "Person",
   "name": "Alyssa P. Hacker",
   "inbox": "https://social.example/users/alyssa/inbox",
   "streams": [
       {
         "id": "https://social.example/users/alyssa/inbox/likes",
         "type": "OrderedCollection",
         "summaryMap": {
            "en": "Likes from Alyssa's inbox",
         },
         "filter": {
           "type": "PropertyShape",
           "path": "type",
           "hasValue": "Like",
           "origin": "https://social.example/users/alyssa/inbox"
        },
        "sort": {
            "type": "SortShape",
            "path": "published",
            "order": "Descending"
        }
      }
   ]
}

codegiant · December 14, 2023, 6:26pm

Overall I appreciate the intent of this FEP. Something caught me a bit at the end though:

Servers could in theory make available a templated URL endpoint that allows for arbitrary sorting or filtering. This should be discouraged, as it could lead to database injections. Instead, only predetermined sorted/filtered CollectionViews should be made available via the streams property.

Maybe this is necessary - not an area I know super a lot about - but doesn’t this limit client developers to only being able to use the list of options the server developer thinks to provide? What if a nifty new client comes along that wants to do something different but good? They have to pester the server dev to just to allow something to be filtered or sorted differently? Seems less than ideal. Is there a way to keep options more open?