The Update Activity: More Than Caching

cjs · November 2, 2019, 10:56am

Being inspired by The Delete Activity And It’s Misconceptions, I’d like to write a bit on the topic of the Update Activity, and my experience with it on both the C2S and S2S sides.

Following the cache model that kaniini lays out in the article, the Update ActivityStreams type can be thought of as a cache-refetch signal to federated peers. As I will show later, this is not the only thing Update does: it also allows federated peers to have as detailed a historical system of record they want without relying on the granularity of the originating peer.

The Update Activity

The specification, like all other Activity types, lays out an open-ended recommendation for how to handle Update activities. No one has to follow the guidance’s "should"s, but at that point a person is relying on a convention outside the specification and not the guidance of the specification. It means they won’t be complying with the only suggested guidance in the spec. What is that guidance? Why should we care what the specification lays out?

Spec Guidance

In the Client-To-Server part of the specification, it actually lays out both the C2S and S2S suggestions. This is an instance where only reading the ActivityPub spec piecemeal leads to pain points.

C2S

The C2S says that only the top-level fields inside of an object in an Update are wholly replaced in the Server’s model of the Activity. As an example, this Note

{
  "type": "Note",
  "content": "Hello",
  "image": {
    "url": "https://example.com/img1",
    "name": "picture of the word hello"
  }
}

Plus this C2S Update

{
  "type": "Update",
  "object": {
    "type": "Note",
    "image": "image": {
      "url": "https://example.com/img2"
    }
  }
}

Would yield a final Note as:

{
  "type": "Note",
  "content": "Hello",
  "image": {
    "url": "https://example.com/img2"
  }
}

A big bummer of this recommendation is its reliance on JSON’s null to delete top-level properties, as certain programming languages don’t handle JSON’s null well (cough Go).

S2S

On the other side, the ActivityPub specification recommends sending a federated peer the whole object so that they can replace their internal representation with it. Thus, after the C2S Update example above, if that server then federated it over S2S, it would federate:

{
  "type": "Update",
  "object": {
    "type": "Note",
    "content": "Hello",
    "image": {
      "url": "https://example.com/img2"
    }
  }
}

This is the cache-updating part of the Update activity.

However! This guidance also contains another property: edits are broken out into their smallest units and federated. The federating server doesn’t wait, accumulate N Updates activities, and send out 1 representing all N updates. This may be an important property for the Fediverse to preserve at large, so that federated peers may have an accurate record for edit history.

In this way, the guidance by the ActivityPub spec makes Update activities more than just the cache-validation/invalidation that Delete activities are. It also ensures that federated peers have the capability of building historical records as edits and changes occur over time, without needing to rely on the originating server to do that for them.

Future

I forked this off of the conversation Notifying remote servers that they should refetch an object because the idea of cache-validation is well known, but I feel the property of historical record is not.

Complying with the ActivityPub spec’s suggestion has let the entire ecosystem let any other application (existing or not) be able to compute their own view of the edit history for an object, empowering users as much or as little as possible. It’s a property wholly implicit in the specification. So it comes down to us as a community whether it is a property we value and want to preserve when we build software that does not comply with this guidance.

nightpool · November 2, 2019, 2:12pm

What part of the spec mentions this? I mean, I think it’s probably good practice to make your Updates quickly and meaningfully, but I would not, for example, suggest that a server break up one logical change that touches many properties into a series of many Update messages—that just sounds like a huge out-of-order/synchronization problem waiting to happen.

kaniini · November 2, 2019, 2:42pm

I don’t see how maintaining object versioning would require an inlined object. The way that JSON-LD interpreters and Litepub implementations work is that any @id typed property that is an IRI is optionally fetched and included into the graph before interpretation occurs. In Litepub implementations, if the implementation does not wish to dereference the IRI, then it must not proceed with processing the full document, discarding it instead. You can see that in Pleroma for example by sending it various kinds of Update activities.

cjs · November 2, 2019, 5:16pm

Nothing, it’s a property of the guidance.

I agree with your suggestion, I’m not suggesting it either. Nowhere am I suggesting “breaking things down”

The implication throughout the guidance and the way C2S and S2S seem to interoperate is:

1 user edit generates 1 C2S Update
That C2S Update is sent to the Server, which generates 1 S2S Update with v2 of object
Server federates that 1 S2S Update with the v2 of object (which represents 1 logical user edit)

That allows the originating Server and federating servers, when displaying their UI, to collapse “meaningless” edit histories (however each of them defines that, doesn’t matter that they disagree), and show “meaningful” ones.

What I am saying instead, is to not bundle up edits together C2S:

1 user edit generates 1 C2S Update
That C2S Update is sent to the Server, which sticks it in a processing queue
1 user edit generates another C2S Update
That C2S Update is sent to the Server, which sticks it in a processing queue
Server processes its queue in batch, generating v2 of object with both edits
Server federates that 1 S2S Update with the v2 of object (which represents 2 logical user edits)

If you fetch an IRI after an Update, another second/third/… Update may have already been issued by the user, so a federating peer has no guarantee they’re obtaining a whole edit history.

1 user edit generates 1 C2S Update A
C2S Update A is sent to the Server, which updates object to v2, and generates 1 S2S Update A with IRI
Server federates that 1 S2S Update A with IRI
Peer received S2S Update A, decides to fetch a bit later
1 user edit generates 1 C2S Update B
C2S Update B is sent to the Server, which updates object to v3
Peer fetches the object due to S2S Update A, but obtains v3 with no way to know whether its the accumulation of >1 user edits.

If Peer and originating Server then try to show the edit history, they’ll not be sync’d.

I don’t want to support “object versioning” natively in ActivityPub, as it opens all sorts of cans of worms for edits that later restrict intended visibility of an object. But in the meantime, following the spec’s guidance does have this property of guaranteed sharing of the complete edit history.

Is it worth preserving? For an application I’ve had in mind (and may soon have the paperwork to actually begin implementing), it’s a property I’d like to have (peers and originating servers able to display the same versioned history). But I’m open to the idea that versioning history may just be a separate problem.

kaniini · November 2, 2019, 6:08pm

Edit history should be considered best effort in a federated network anyway?

nightpool · November 2, 2019, 6:40pm

I guess i’m a little confused because even the idea of bundling updates like that makes zero sense to me. Activities in ActivityStreams represent actions taken by an actor, and making an “synthetic” Update activity that doesn’t 1:1 represent an action taken by an actor kind of seems like defeating the purpose? And in the full c2s2s spec, mutating client’s activities like that seems like it falls way outside the spec’s requirements that you deliver activities to the addressees. At the least it would be very surprising to clients.

I think thinking of Update activities as Activities made by a client and only distributed by a server, just like any other federated activity, might make the disconnect between what you’re saying and what @kaniini is saying more clear. Clients can choose to federate whatever Update activities they wish, or not send Update activities at all. But if they do choose to send one out, the server should respect that and deliver it to everybody who’s been addressed just like any other activity.

cjs · November 2, 2019, 7:21pm

That’s just because everything being delivered is best effort in a federated network? I don’t see how edit history is anything special.

I think I’ve made clear how, if the ecosystem sends Updates using ActivityPub guidance, pushing a literal value in object (which, receiving servers like Litepub always have the luxury to ignore) has a fundamentally different emergent property (the ecosystem has the option to build more accurate edit histories, whether Litepub or not) than asking a peer to (maybe) pull/request the updated object value at a later point in time (which, to retain the emergent original property, there’d need to be a whole new mechanism to think through).

If I haven’t, please let me know!

Yet again, I fully agree with you, hence why I explicitly advocated not doing it.

I mention the bundling use case because implementations are primed for queue-based processing, and new applications (whether or not they use go-fed) are exploring using non-human actors (such as Server) and may get the (bad) itch to “optimize”.

I don’t see a disconnect between kaniini and I? My comment was about an emergent property of the ActivityPub guidance on the general ecosystem and theirs was about how even specific Litepub clients in the ecosystem can ignore these Updates. Plus, I pretty much agree with everything you’ve been saying. Perhaps start giving me better credit?

trwnh · November 3, 2019, 6:36pm

and we’ve already sacrificed network-wide consistency, anyway, in favor of availability and partition-tolerance. https://en.wikipedia.org/wiki/CAP_theorem

cjs · November 3, 2019, 8:08pm

This is a tangent, but I don’t think CAP is applicable. The Fediverse as currently implemented isn’t a distributed datastore. Each server is treated as the sole authoritative data storage location for its data (but it may be cached). When a server goes down, so too does its data (eventually, once caches purge). This could change with something like Datashards in the future.

AceNovo · November 6, 2019, 12:08pm

I can’t remember what got me started reading about CRDT, but I pretty sure it was someone complaining about them being too complicated

Regardless, if one follows the ActivityPub recommendations, S2S is a state based CRDT and C2S is an operation based CRDT. There is a guarantee of eventual consistency, which isn’t the same degree of consistency as C in CAP, but it’s sufficient for many purposes

So there’s prior work here for understanding the characteristics of these systems, which might be useful. Also the Wikipedia article is accessible and chock full of search terms:

cjs · November 6, 2019, 4:07pm

I think you mentioned this to me before, thanks for bringing it back up! In those terms:

When Update exclusively with an object literal, if a server is available to receive it, if a receiving server chooses to do so, can with certainty diff to obtain the operation that led to the current CRDT state.
When Update with an IRI, if a server is able to receive it, if a receiving server chooses to do so, can only presume the diff that led to the current CRDT state.

AceNovo · August 24, 2020, 8:13am

I was incorrectly assuming that update applied to collections and collections only. When you modify a document in situ without changing its identifier, you need a mechanism other than time stamps and serial numbers

In some situations, you can take a diff and the version with the the longest change list would be most recent. AFAICT, though, there’s no way interior to the protocol to know which state is more recent