How to handle incoming Delete activities is already well-discussed elsewhere [1][2].
However, I have a hard time finding material on how to handle outgoing Delete activities. Specifically around this question: Who do I send such an activity to?
Naively, you might send the activity to the actor’s followers. But imagine this simple scenario:
Alice follows Bob.
Bob posts a Note that Alice receives.
Alice stops following Bob.
Bob deletes the Note.
If we only send Delete activities to the followers of the actor, Alice will never receive the Delete activity and the Note will not be deleted, which is clearly not what Bob intended.
Another strategy might be to keep a record of all historical followers of an actor, and then send Delete activities to current and past followers. However, I am worried that this is not good enough either:
Alice follows Bob and Bob follows Charlie.
Charlie posts a Note that Bob receives.
Bob shares (Announces) the Note, and Alice receives the Note too because she follows Bob.
Alice stops following Bob and Bob stops following Charlie.
Charlie deletes the Note.
I am a bit dumbfounded at what to do in this scenario. I might be missing something, but it seems impossible for Charlie to know that Alice has the Note. The only way I could possibly see this working is that Bob receives the Delete activity (as he is a past follower) and then graciously re-Announces the Delete activity to his followers, since he previously announced the deleted Note. Then Alice would also receive the Delete, as the re-Announce of the Delete would also need to be sent to all past followers.
However this seems complicated and relies on the good behaviour of Bob to re-Announce to his past followers, and I see no way to discover if Bob never re-Announced the deletion.
Another option is to send Delete activities to all known servers, but this doesn’t necessarily include Alice either I would think.
How is this done in existing implementations? This is making me think reliable deletions are impossible and I’d love to be proved wrong.
It is impossible, yes. You never know who has a copy, as people can fetch public things without authentication, but what you can do is use “best effort” delivery paths. By default, this is “every known actor”. If you maintain a little information ahead of time, such as disabling public access and enforcing some kind of authentication on fetch plus tracking of delivery recipients, you can reduce your total set from the entire known universe.
But this is assuming a certain worldview where everyone else is an “instance” that will syndicate a copy status on Create activities, which might not be the case if your Create is consumed as a regular notification. And of course people can ignore Delete activities even if they do have a copy, and so on. So this is very much “best effort” with an emphasis on “effort”. At best, you can only strive to notify of a deletion to anyone who was aware of a creation or its byproduct, assuming you track those. Otherwise, you just blast it out there for everyone, or accept that old copies may be floating around and send only to your followers, or some other heuristic or strategy.
I don’t think the problem goes away even if we consider non-public activities though. But it makes it easier perhaps.
Do you have any idea how established implementations tackle this? What you suggest with keeping track of what actors have fetched what posts seems cumbersome and complicated. I guess I may just settle for sending to all known instances, even if that seems awfully excessive.
Pleroma/Akkoma I think don’t bother sending Deletes to everyone, they just accept the impossibility. IIRC there might be a patch floating around somewhere to do some kind of tracking on signed fetches, but it’s not part of mainline.
A no op delete is generally a very inexpensive activity, so I don’t see a problem with federating it promiscuously. obviously, it does give away a little bit of metadata information such as the ID of an activity the remote server may not know about, but for public activities, the trade-off is generally worth it and servers could consider sending decoy delete activities if privacy is super important
it might be “inexpensive” on its own, but it spams inboxes, fills databases, consumes federation workers, and represents significant noise for smaller or lesser-powered servers. it generally doesn’t make sense unless you assume everyone else is replicating some state machine. probably the logic that misskey uses is good enough, and sending Deletes to the entirety of the known universe is extraneous.
I think delivering Delete to all actors who interacted with the object is optimal (=everyone who liked, reacted, replied or reposted). Same for Update activities.
Using a bloom filter is an interesting approach. But to @nightpool’s point, not sure it makes too much sense as an optimization. I imagine Delete activities already account for a very small percentage of the total activities sent, so sending a few too many probably doesn’t have a big impact.
how could a delete activity “fill” a database, except through implementation error? surely a delete activity should only ever have the ability to empty a database.
as for spamming inboxes and consuming federation workers, I believe I’ve already addressed that. Delete activities are rare and—if you don’t have any responsive content—extremely inexpensive.
There are some implementations that (perhaps unwisely? Who am I to judge) keep a record, or at least a cache, of all the incoming (and maybe outgoing) activities they receive. Or the “database” here could be a queue for processing incoming activities perhaps, where the delete would fill the queue.
Again, don’t think it’s a big concern but I can definitely see how even delete activities can cost storage and/or memory.
to reiterate, the idea that a Delete only ever “empties” a database is rooted in the idea that a Delete is a transient activity that is consumed upon receipt and handling of side effects. if you give a Delete an id, then it may reasonably be persisted. this may be done for several reasons, particularly for implementations that treat the inbox as an inbox and not as an RPC interface – the Delete is a notification message just like any other, and messages in an inbox shouldn’t generally randomly disappear. i think the case where an inbox is only ever read by a single client is common for fedi, but absolutely not a guarantee by the ActivityPub specification.
On smaller servers the majority of incoming activities are Delete, because Mastodon is flooding the network with them. Everyone hates it, but there is no way to stop it. I have inboxes that were deleted more than 3 years ago, and Mastodon servers still send thousands of Delete activities there every hour.