Is there a requirement that an activity ID be unique?
Reason why I ask is it seems prudent to save a list of encountered activities, and drop those that have been seen before.
However that caused me to run head-first into a NodeBB regression because we ourselves don't actually send unique activity IDs.
For example, a Follow-Undo(Follow)-Follow chain would have the two Follows with the same ID, since we just construct them ad-hoc based on request data.
Easy fix is to throw in a timestamp there, but it got me wondering about whether there were uniqueness expectations at all, or whether I was being overzealous in checking for it.
So the requirement as I understand it is that the id must be publicly resolvable, which would imply a uniqueness constraint.
Still, the verbage doesn't say that it must be a publicly resolvable _to the object in question_ IIRC? I'd have to look this up to confirm it and am not in a position to do so right now, but that's an interesting question.
>All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient
Can I trick anyone thinking about deduplication and identifier schemes into reviewing my FEP and proposing changes or additions? I’m a huge fan of content-identifiers (whether the IPFS kind or others) so understanding dedup and performance needs inherent to the protocol informs some research I’m doing on content-identified AP:
Mastodon does a GET on every ID in the activity which I think is really really over the top. However PieFed returns 404 on the inner IDs and that doesn’t stop Mastodon from accepting the post so idk why they bother.
Akomma just does a GET on the outermost ID.
Lemmy and PieFed never do any GETs on IDs and just relies on the Signature being valid. But the outermost IDs do need to be unique otherwise they’ll be detected as an unnecessary retry and dropped.
Trust, probably? If the object of an activity has an id, then the information in the activity’s object node might not match the information obtained when dereferencing the id using HTTP GET. There are essentially 3 possibilities:
The object has no id, so you use any information about the object as-is.
The object has an id, but you use the information about the object as-is, without doing an HTTP GET.
The object has an id, but you replace the object node with the latest information obtained from an HTTP GET of the id. If any errors occur here, you can fallback to the previous case.
Nothing in ActivityPub has to be dereferenceable; it just has to be globally unique (so receiving the same activity twice can be detected). In practice however, you may still want to reprocess the same activity. Take for example a sender that doesn’t use unique ids for Follow activities. If they Follow, then Undo Follow, depending on how you process Undo activities, they may not be able to refollow you.
Follow activities should be processed even if their id has already been seen. It may be that the sender has lost their local state, so sending them an Accept Follow to “remind” them can prevent further issues.
Undo activities can be handled in one of two ways:
Undo should mark previously processed activities as “unprocessed”, so if the same id is received later, it should be processed as if it were a never-before-seen activity. Otherwise, you would return an error and ask the sender to re-send with a fresh id (and this can cause problems).
One exception is out-of-order delivery. If you receive an Undo for an activity you don’t know about, then you receive a Follow with the same id as the Undo’s object, you can probably ignore the Follow and assume it was undone.
Alternatively, Undo should be stored as-is and any activities received with an id matching an existing Undo’s object id should be dropped. If you can, respond with some kind of indication of this (304? 400? 409? maybe not 202) so that the sender hopefully knows to use a fresh id.
this is fine if you don’t trust the POST payload for whatever reason – an http Signature doesn’t imply any particular purpose for the signature, so you can’t say with 100% certainty that the controller of the key is the one who delivered the message, or if they are asserting the contents of the message to be true, or some other thing. all you can say is that someone who controlled the key signed the message as it was being formed.
Other protocols like WebMention and WebSub also do a GET when pinged. WebSub also has optionally signed payloads with the implied purpose of authentication via X-Hub-Signature of the content body using the key from hub.secret.