The Delete Activity And It's Misconceptions

kaniini · October 13, 2019, 6:25pm

Instead of using my personal blog, I’m going to just start writing about ActivityPub here instead, as I think it is more useful to have a central repository of all knowledge relating to ActivityPub that is easily searchable.

Today, I want to talk about the Delete activity.

When talking to people who want to learn ActivityPub or auditing codebases or whatever, people are frequently confused about how the Delete activity actually works and how it should be implemented. Hopefully after reading this post, it should be clear how it fits together.

Why does ActivityPub have a Delete activity in the first place?

The ActivityPub model is ultimately based on linked data, so a clever person might observe that really, a Delete activity is not necessary in many cases, in a network where the linked data principle is applied correctly. After all, if data is deleted from the location where it exists (it’s origin), then the graph is incomplete when expanded, since not all parts of the graph can be fetched.

But the fediverse does not really operate using a pure linked data model. Most fediverse applications store data in an internal format, usually based on database tables with columns and rows that get serialized and deserialized to ActivityStreams objects over the wire as needed. Additionally, they maintain copies of remote data in the local database as a cache.

Note I said cache there. This is the first misconception of most developers implementing ActivityPub: they believe remote data is to be treated with the same standards as local data. This is really not the case at all: you are maintaining a cache, and only a cache.

And so we have the Delete activity to perform something very important: cache invalidation. The Delete activity is perhaps an unfortunate name for this in the Server to Server protocol, because in the Client to Server protocol, it exists to delete local data. Since the AP spec is ambiguous at best on the topic of caching in general, there really isn’t much discussion of how Delete maps in the Server to Server space.

But to Delete in a Server to Server context actually means to invalidate (and perhaps evict) locally cached data related to a given object or profile.

Unfortunately because of this fundamental misunderstanding of how remote data is to be treated, I run into a lot of misconceptions, which result in questions like:

How do we know if the remote actor is an admin or moderator?
Under what criterion should we trust any Delete activity?

In the last example, I added emphasis to Delete, because I believe the very name of the activity is misleading to people who are starting out and not familiar with the architecture of the ActivityPub federated network. I’ve had many developers emphasize that they were processing a Delete and then argue about it (the latest being the GNU Social developers). But at the same time, it is the name of the activity.

What criterion should be applied to Delete activities?

I hope I’ve been able to make a reasonable argument about Delete activities in the Server to Server protocol really being about cache invalidation. These criterion will likely only make sense if you see Delete activities as being about cache invalidation.

In that framing, the criterion is actually pretty simple. There’s two stances you can take, depending on your invalidation strategy.

If your invalidation strategy is to immediately evict the data from your cache, then:

Check that any signatures match the actor that is performing the activity. This one should be obvious.
Check that the Delete activity’s actor references an object that the actor has control over: In the DNS-based ActivityPub network, this means that admin@social.example does not have control over objects published by chatty.example. In other networks, you may want to check for additional cryptographic proofs.
Optionally try to refetch the object. If the object has been updated, replace your copy of it. Otherwise, delete the object from your cache.

If your strategy is simply to mark the data as dirty, and refetch it later, then you don’t really need to do anything more than mark it dirty and lazily refetch it later. You should not try to display the data until you’ve refetched it, of course. You don’t even need to trust the Delete at all in this case, since you’re just using it as a hint that an object may or may not be there anymore.

Bonus round: What Tombstone objects are actually for!

Anybody who knows me knows that I am quite interested in deniability, so they always ask me about Tombstone objects.

Tombstone objects exist as a cache-invalidation hint. From a deniability perspective, serving them to the public is a bad idea (you should always return 404 instead and serve nothing), but they are useful internally.

Here is what I mean: lets say that social.example has a new user: https://social.example/~nazi. This user posts materials that you do not want on your server or in your cache, for example, one of his posts may have an id of https://social.example/~nazi/posts/1234.

The way this is solved in Pleroma is to swap the actual post stored in our cache with that id with a Tombstone of the same id. This ensures Pleroma will never try to refetch the object when it detects that it is “missing” from the cache, because the object already exists as a Tombstone. It is the same basic idea with any other implementation that works on a linked data model or “hybrid” model as Pleroma does.

Hopefully this clarifies how Delete activities work in Server to Server and what Tombstone objects are used for, since the spec doesn’t really make any of this very clear.