FEP-bad1: Object history collection

Hello!

This is a discussion thread for the proposed FEP-bad1: Object history collection.
Please use this thread to discuss the proposed FEP and any potential problems
or improvements that can be addressed.

Summary

[AS2-Core] provides examples 18, 19, 32 which represent the “history” of an object.

Particularly in example 32, we see an object being Created, Updated, and Deleted. However, there is no property dedicated to advertising a collection fit for this purpose. This FEP attempts to define one.

cc @trwnh

An object’s history collection will necessarily be ordered chronologically, although whether the ordering should be forward chronological or reverse chronological is an open question; at the time of writing this FEP, [ActivityPub] Section 5 currently contains the following language:

An OrderedCollection MUST be presented consistently in reverse chronological order.

This language indicates that if OrderedCollection is used, the ordering MUST be reverse chronological.

An OrderedCollection defined in an extension is not subject to this requirement. See: ActivityPub errata/Proposed - W3C Wiki

It’s not specific to this FEP, but the ambiguity of reverse chrono ordering is problematic in general. According to the Activity specification:

What property is used to determine the reverse chronological order is intentionally left as an implementation detail.

Especially for an Object’s Activity history, it seems we’d want it to be specifically ordered by the time the Activity was performed versus receive time, store time or some other time-related property.

Side comment: It feels like we are starting to define special properties and collections like this as a workaround for not having a general-purpose Object query capability. This is not a criticism of this FEP, just an observation of a trend I’m noticing. I could see this leading to a plethora of properties over time for special-purpose, inflexible “queries” and associated indexing. However, I also understand that defining a general-purpose query capability (maybe SPARQL-ish with authorization filtering) would be challenging and a topic for another discussion thread.

I suppose it could sort of be reconstructed with a sort of SPARQL query for where object == some id, which would return all activities targeting that object. You could filter the author’s outbox for this, and really, you could filter any collection that you would expect to contain all activities related to that object. But I do agree that this sort of filtering or querying support is more and more needed, as we are discovering with the submission of several recent FEPs. The closest prior effort to this is FEP-5bf0 which proposes using streams for pre-filtered sub-collections, but my current thinking is that there should be an endpoint defined in endpoints against which you can submit SPARQL queries. I just don’t know enough about SPARQL yet to come up with a fully fleshed-out proposal. Additionally, endpoints are generally only exposed on actors, so either endpoints needs to be attached on objects as well (which doesn’t make sense for most of the endpoints such as the OAuth ones), or the specific property needs to be attached to the object directly. Something like Collection.sparqlEndpoint? Or more indirectly via attributedTo.endpoints.sparqlEndpoint? This is the domain of some other FEP, though…

For this FEP, I think it’s useful enough to have an object’s history explicitly presented, for use cases where other consumers don’t particularly care to run their own queries.

Per the discussion on the PR, there is also another alternative:

RFC5829 defines the following rels:

  • version-history
  • latest-version
  • predecessor-version
  • successor-version

it sounds like litepub:formerRepresentations is semantically equivalent to version-history, so my preference would be to formally name such a property versionHistory.

the challenge is in storing each revision of an object, or generating it on-the-fly. which ID do you use? do you use an ID at all? i could see something like so:

id: <some-post>
type: Note
content: This post has been edited.
published: 2023-06-21
updated: 2023-06-22
versionHistory:
  - id: <some-post/history>
    type: OrderedCollection
    orderedItems:
      - id: <some-post/history/1>
        type: Note
        content: This post has not been edited.
        published: 2023-06-21
        versionHistory: <some-post/history>

either the Update activity, during processing, should generate <some-post/history/1> as an exact copy of the object at the time of the edit, and at the time of the first edit, this versionHistory should be created. or possibly, the versionHistory and <some-post/history/1> should be created at the time of the original object.

this approach probably works for mastodon API use-cases but not so well for anyone trying to reconstruct or represent actual history (file creation, update, deletion, etc). maybe that’s fine. i still want to explore exposing this via result though…

I was thinking of an “instance actor” endpoint.

Isn’t a version history something different than what this FEP is proposing? Given a time-series of change events (chrono-sorted Activities), one can materialize a version history but my understanding is that the FEP doesn’t represent that explicitly.

I see there is sometimes interesting information in the PR comments for FEPs. Has there been any discussion about that and whether we should just announce FEPs here and discuss them in the related issue comments?

Correct, this FEP doesn’t describe a versionHistory with objects representing revisions, it describes a history collection with Create/Update/Delete activities targeting the object. The versionHistory is presented as an alternative take that would be defined in a separate FEP. I was just reproducing the PR comment here in-thread so it could be more easily tracked.

I think it’s difficult to tell from the FEP how exactly a producer should represent objects in a history collection.

Naively, the producer might the embed objects into activities just as they were at the time of the activities, but that would result in an incoherent RDF dataset for LD consumers, because the objects would share a same @id that way, as I’ve mentioned in the following issue:

Both the examples of Activity Streams Recommendation and the FEP avoids that exact pattern, but with different approaches: the former uses the Web Annotation while the latter uses blank nodes.

And the example of the FEP is still semantically questionable, because the example asserts the following triples:

<https://example.com/some-file> a as:Tombstone ;
    as:formerType as:Document ;
    as:url "https://example.com/404" .

<https://example.com/some-file/log/3> a as:Delete ;
    as:object <https://example.com/some-file> .

<https://example.com/some-file> as:url "https://example.com/storage/hash2" .

… which indicates that the “deleted” object somehow has two urls: the former is from the Tombstone and the latter is from the Update activity.

Also, when viewed as an individual activity, the Update activity’s object in the example doesn’t have any type whereas the Create activity’s object has the type of Document, which seems slightly inconsistent. That might be fine as a partial Update activity, but I doubt if the partial update is appropriate here since ActivityPub specifies that the partial update is for C2S interactions only.

Perhaps, you can use a blank node for every object, but then, how would a client identify the object when fetching only an individual activity in the collection?

Well, I feel like we are facing the classic RDF problem of representing temporal aspects of resources, and I suspect the solution is not as obvious as it seems at first glance.

I would rather argue that the example is trying to solve a problem that is beyond the scope of the FEP. As already mentioned in this topic, I think that the FEP should focus on the role as a collection of “activities”, instead of representing how the objects of the activities used to look like, and that the example should use "object": "https://example.com/some-file" for every activity without embedding it.

I think the object’s revisions are more naturally represented by an alternative mechanism like the one mentioned in the PR. Or a mechanism based on link relations defined in RFC 7089 (HTTP Framework for Time-Based Access to Resource States – Memento) might also be an alternative, although it’s a random thought and I haven’t read through the specs yet.

Well, but you might not want to do this as it would make the collection significantly less interesting.

An approach like Example 32 of the Activity Streams Recommendation may be an alternative that allows representing the object’s resource state, but that might a little too, well, Semantic, for wider adoption.

Or we might be able to embed the object, but with a different id (like Example 32 does) or a blank node identifier, and annotate it with a (indirect) link to the original object, like the following:

{
  "id": "https://example.com/some-file/log/1",
  "type": "Create",
  "object": {
    "id": "https://example.com/some-file/log/1#object",
    "type": "Document",
    "href": "https://example.com/storage/hash1",
    "url": {
      "type": "Link",
      "rel": "original",
      "mediaType": "application/ld+json; profile=\"https://www.w3.org/ns/activitystreams\"",
      "href": "https://example.com/some-file"
    }
  }
}
1 Like