FEP-bad1: Object history collection

helge · July 6, 2023, 12:25pm

Hello!

This is a discussion thread for the proposed FEP-bad1: Object history collection.
Please use this thread to discuss the proposed FEP and any potential problems
or improvements that can be addressed.

Summary

[AS2-Core] provides examples 18, 19, 32 which represent the “history” of an object.

Particularly in example 32, we see an object being Created, Updated, and Deleted. However, there is no property dedicated to advertising a collection fit for this purpose. This FEP attempts to define one.

cc @trwnh

stevebate · July 6, 2023, 2:09pm

An object’s history collection will necessarily be ordered chronologically, although whether the ordering should be forward chronological or reverse chronological is an open question; at the time of writing this FEP, [ActivityPub] Section 5 currently contains the following language:

An OrderedCollection MUST be presented consistently in reverse chronological order.

This language indicates that if OrderedCollection is used, the ordering MUST be reverse chronological.

An OrderedCollection defined in an extension is not subject to this requirement. See: ActivityPub errata/Proposed - W3C Wiki

It’s not specific to this FEP, but the ambiguity of reverse chrono ordering is problematic in general. According to the Activity specification:

What property is used to determine the reverse chronological order is intentionally left as an implementation detail.

Especially for an Object’s Activity history, it seems we’d want it to be specifically ordered by the time the Activity was performed versus receive time, store time or some other time-related property.

Side comment: It feels like we are starting to define special properties and collections like this as a workaround for not having a general-purpose Object query capability. This is not a criticism of this FEP, just an observation of a trend I’m noticing. I could see this leading to a plethora of properties over time for special-purpose, inflexible “queries” and associated indexing. However, I also understand that defining a general-purpose query capability (maybe SPARQL-ish with authorization filtering) would be challenging and a topic for another discussion thread.

trwnh · July 6, 2023, 8:45pm

I suppose it could sort of be reconstructed with a sort of SPARQL query for where object == some id, which would return all activities targeting that object. You could filter the author’s outbox for this, and really, you could filter any collection that you would expect to contain all activities related to that object. But I do agree that this sort of filtering or querying support is more and more needed, as we are discovering with the submission of several recent FEPs. The closest prior effort to this is FEP-5bf0 which proposes using streams for pre-filtered sub-collections, but my current thinking is that there should be an endpoint defined in endpoints against which you can submit SPARQL queries. I just don’t know enough about SPARQL yet to come up with a fully fleshed-out proposal. Additionally, endpoints are generally only exposed on actors, so either endpoints needs to be attached on objects as well (which doesn’t make sense for most of the endpoints such as the OAuth ones), or the specific property needs to be attached to the object directly. Something like Collection.sparqlEndpoint? Or more indirectly via attributedTo.endpoints.sparqlEndpoint? This is the domain of some other FEP, though…

For this FEP, I think it’s useful enough to have an object’s history explicitly presented, for use cases where other consumers don’t particularly care to run their own queries.

Per the discussion on the PR, there is also another alternative:

RFC5829 defines the following rels:

version-history

latest-version

predecessor-version

successor-version

it sounds like litepub:formerRepresentations is semantically equivalent to version-history, so my preference would be to formally name such a property versionHistory.

the challenge is in storing each revision of an object, or generating it on-the-fly. which ID do you use? do you use an ID at all? i could see something like so:
id: <some-post>
type: Note
content: This post has been edited.
published: 2023-06-21
updated: 2023-06-22
versionHistory:
  - id: <some-post/history>
    type: OrderedCollection
    orderedItems:
      - id: <some-post/history/1>
        type: Note
        content: This post has not been edited.
        published: 2023-06-21
        versionHistory: <some-post/history>
either the Update activity, during processing, should generate <some-post/history/1> as an exact copy of the object at the time of the edit, and at the time of the first edit, this versionHistory should be created. or possibly, the versionHistory and <some-post/history/1> should be created at the time of the original object.

this approach probably works for mastodon API use-cases but not so well for anyone trying to reconstruct or represent actual history (file creation, update, deletion, etc). maybe that’s fine. i still want to explore exposing this via result though…

stevebate · July 6, 2023, 11:57pm

I was thinking of an “instance actor” endpoint.

Isn’t a version history something different than what this FEP is proposing? Given a time-series of change events (chrono-sorted Activities), one can materialize a version history but my understanding is that the FEP doesn’t represent that explicitly.

stevebate · July 7, 2023, 12:00am

I see there is sometimes interesting information in the PR comments for FEPs. Has there been any discussion about that and whether we should just announce FEPs here and discuss them in the related issue comments?

trwnh · July 7, 2023, 2:04am

Correct, this FEP doesn’t describe a versionHistory with objects representing revisions, it describes a history collection with Create/Update/Delete activities targeting the object. The versionHistory is presented as an alternative take that would be defined in a separate FEP. I was just reproducing the PR comment here in-thread so it could be more easily tracked.

tesaguri · March 30, 2024, 4:11am

I think it’s difficult to tell from the FEP how exactly a producer should represent objects in a history collection.

Naively, the producer might the embed objects into activities just as they were at the time of the activities, but that would result in an incoherent RDF dataset for LD consumers, because the objects would share a same @id that way, as I’ve mentioned in the following issue:

github.com/w3c/activitypub

Semantics of embedded `object`s of `Update` activities in collections of activities

opened 12:57AM - 06 Jan 24 UTC

tesaguri

The ActivityPub Recommendation implies that `Update` activities must have a set …of the changes ([6.3.1][activitypub-partial-updates]) or the whole object ([7.3][activitypub-update-activity-inbox]) embedded as its `object` property value (at least for client-to-server interactions). But I think it's not quite obvious how this representation is to be interpreted in a collection of activities. For example, suppose an actor's outbox has the following activities: ```jsonld { "@context": "https://www.w3.org/ns/activitystreams", "id": "https://example.com/actors/1/outbox", "type": "OrderedCollection", "orderedItems": [ { "id": "https://example.com/activities/2", "type": "Update", "object": { "id": "https://example.com/notes/1", "type": "Note", "content": "Hello, world!" } }, { "id": "https://example.com/activities/1", "type": "Create", "object": { "id": "https://example.com/notes/1", "type": "Note", "content": "Hello, word!" } } ] } ``` This might look fine as a plain JSON document, but as an RDF dataset, the embedded `object` property values are indistinguishable from each other and the collection would mean something like the following: ```turtle @prefix as: <https://www.w3.org/ns/activitystreams#> . <https://example.com/actors/1/outbox> a as:OrderedCollection ; as:items ( <https://example.com/notes/1/history/2> <https://example.com/notes/1/history/1> ) . <https://example.com/activities/2> a as:Update ; as:object <https://example.com/notes/1> . <https://example.com/activities/1> a as:Create ; as:object <https://example.com/notes/1> . <https://example.com/notes/1> a as:Note ; as:content "Hello, world!", "Hello, word!" . ``` This might not be a problem if the activities are transient and won't show up in the `outbox`, but there are desires for collections of `Update` activities (among others) like [FEP-bad1] (<cite>Object history collection</cite>). I don't think the problem can be "fixed" by changing the representation in C2S/S2S interactions since doing so would lead to a compatibility hazard, but I still believe that there needs to be some guidance for publishers who want to have non-transient `Update` activities. I suppose a possible approach would be to clarify that the representation is meant for transient activities only, and that publishers of non-transient `Update` activities should use a different representation, without specifying the exact alternative representation just like Activity Vocabulary does in its definition of the [`Update`] activity. Although this wouldn't solve any real-world problem by itself, I think it would at least help publishers make informed decisions. [activitypub-partial-updates]: <https://www.w3.org/TR/2018/REC-activitypub-20180123/#partial-updates> [activitypub-update-activity-inbox]: <https://www.w3.org/TR/2018/REC-activitypub-20180123/#update-activity-inbox> [FEP-bad1]: <https://codeberg.org/fediverse/fep/src/commit/a8c065a93d4509b2460e0a9f4e45194da6bf9d37/fep/bad1/fep-bad1.md> [`Update`]: <https://www.w3.org/TR/2017/REC-activitystreams-vocabulary-20170523/#dfn-update>

Both the examples of Activity Streams Recommendation and the FEP avoids that exact pattern, but with different approaches: the former uses the Web Annotation while the latter uses blank nodes.

And the example of the FEP is still semantically questionable, because the example asserts the following triples:

<https://example.com/some-file> a as:Tombstone ;
    as:formerType as:Document ;
    as:url "https://example.com/404" .

<https://example.com/some-file/log/3> a as:Delete ;
    as:object <https://example.com/some-file> .

<https://example.com/some-file> as:url "https://example.com/storage/hash2" .

… which indicates that the “deleted” object somehow has two urls: the former is from the Tombstone and the latter is from the Update activity.

Also, when viewed as an individual activity, the Update activity’s object in the example doesn’t have any type whereas the Create activity’s object has the type of Document, which seems slightly inconsistent. That might be fine as a partial Update activity, but I doubt if the partial update is appropriate here since ActivityPub specifies that the partial update is for C2S interactions only.

Perhaps, you can use a blank node for every object, but then, how would a client identify the object when fetching only an individual activity in the collection?

Well, I feel like we are facing the classic RDF problem of representing temporal aspects of resources, and I suspect the solution is not as obvious as it seems at first glance.

I would rather argue that the example is trying to solve a problem that is beyond the scope of the FEP. As already mentioned in this topic, I think that the FEP should focus on the role as a collection of “activities”, instead of representing how the objects of the activities used to look like, and that the example should use "object": "https://example.com/some-file" for every activity without embedding it.

I think the object’s revisions are more naturally represented by an alternative mechanism like the one mentioned in the PR. Or a mechanism based on link relations defined in RFC 7089 (HTTP Framework for Time-Based Access to Resource States – Memento) might also be an alternative, although it’s a random thought and I haven’t read through the specs yet.

tesaguri · March 30, 2024, 11:13pm

Well, but you might not want to do this as it would make the collection significantly less interesting.

An approach like Example 32 of the Activity Streams Recommendation may be an alternative that allows representing the object’s resource state, but that might a little too, well, Semantic, for wider adoption.

Or we might be able to embed the object, but with a different id (like Example 32 does) or a blank node identifier, and annotate it with a (indirect) link to the original object, like the following:

{
  "id": "https://example.com/some-file/log/1",
  "type": "Create",
  "object": {
    "id": "https://example.com/some-file/log/1#object",
    "type": "Document",
    "href": "https://example.com/storage/hash1",
    "url": {
      "type": "Link",
      "rel": "original",
      "mediaType": "application/ld+json; profile=\"https://www.w3.org/ns/activitystreams\"",
      "href": "https://example.com/some-file"
    }
  }
}

silverpill · February 19, 2025, 9:58pm

The history stream contains all activities which target the object as object, where the actor matches the attributedTo actor. This might include Create, Update, and/or Delete activities.

How could we define history of a collection?

I think in case of a collection activities would be Add and Remove and the property would be target.

trwnh · February 20, 2025, 1:42am

I need to think more about this, because in the past year or so, I have come to realize that Update activities are kind of busted. See discussion on Section 6.3.1 C2S Partial Update property deletion behavior is impossible and should be deprecated · Issue #477 · w3c/activitypub · GitHub starting from where @stevebate comments, until near the end. Basically, two things:

Partial Updates (and all C2S Updates are specified to be necessarily partial) are actually consumed by the outbox handler and used to generate a full Update, which is what gets stored/delivered/etc.
Partial Updates should be reworked to be more like HTTP PATCH, taking a patchset in some format (json-patch, jsonld-patch, rdf-patch, ldpatch, something else?) and applying it to the resource.

If/when this is done, the history collection would maybe contain Patch activities. But some further thought needs to be given to this, because a similar issue applies to Create activities… the Create.object also references the current object.

More broadly, I’m still not entirely sure that we can reconcile “the activity that gets POSTed to the outbox” with “the activity that gets persisted in the outbox”. Confusingly, they are not the same thing. What gets POSTed is technically a blank node. What gets persisted is necessarily modified to at least assign it an id (and also possibly strip bto/bcc, among other things).

That could be an option, and it probably makes sense in a lot of cases, but it depends heavily on which activities get persisted and in what form they get persisted. Ideally, different activities shouldn’t be embedding different claims about the same resource. But we can’t really control what other people publish in their activities; we can only Add them to the history collection, and ask/encourage everyone to not embed non-authoritative objects… I guess?

trwnh · February 20, 2025, 2:38am

I think that the definition could be loosened to apply to either object or target references, but there could be value in limiting it to object references. At least conceptually, the idea behind history is to represent things like file history in a Nextcloud-like application, inspired by AS2-Core Example 32 (“Eric wrote a note” → “Eric edited a note” → “Eric deleted a note”). Honestly, the “actor matches the attributedTo actor” restriction might also be relaxed, but if you loosen it that far, then you might as well use outbox…

Since the outbox “could potentially [return] all relevant [activities] published”, so the door is open for pretty much any relation to be “relevant” (object, target, tag, context, etc). Especially once you consider inbox forwarding, it might make sense for owners to maintain an outbox that contains all such relevant activities (by id if not authoritative, or perhaps embedded with a signature/proof if one is present). This is a matter of protocol, though – ActivityPub doesn’t mandate anything particular beyond requiring that each of the items of the OrderedCollection is an Activity. So it’d be up to the owner to decide what they want to publish.

In any case, this FEP was submitted over 1.5 years ago and probably needs to be reworked. Giving it a quick glance I see the following issues that I need to address in a PR:

Fix the reference auto-linking
Fix the term definition
Fix outdated reference to FEP-9606 (now FEP-888d)
Rework comparison with FEP-7888 to note that that context “may have” rather than strictly “may be” a collection
Add comparison to outbox
Add brief discussion of conceptual issues with conflicting representations (Update, etc)

silverpill · February 21, 2025, 11:48am

I like the current definition because it captures how modification of an object looks like. Loosening it would make this FEP less useful.

If collections are to be mentioned, it would be better to describe collection history in a different sentence.

Huh? This is not how ActivityPub describes outbox: ActivityPub

The outbox stream contains activities the user has published
…This could potentially be all relevant objects published by the user

trwnh · February 21, 2025, 12:08pm

“published by a user” is not “performed by the user’s actor”. The reference to objects is an error that should be errata’d — the collection contains strictly activities.

wrt Collection.history it seems weird and wrong to have history suddenly behave differently depending on what it’s attached to. I still need to think about if it’s enough to just include target or if this breaks the concept.