FEP-8fcf: Followers collection synchronization across servers

how · December 1, 2020, 5:31pm

Hello,

I created a new topic to host this discussion so it provides:

a direct link to the repository entry
a clean first post to host the future final version

I described the whole thing in About the Fediverse Enhancement Proposals.

datatitian · December 19, 2020, 6:44pm

I’m just at the stage of implementing shared inbox delivery in activitypub-express, so this is a great discussion for me to follow.

Your option of expanding the collection in addressing really appeals to me. Could we not take it a step further and expand it to include the items, i.e. the specific recipients on the server being delivered to? Then there would be no need to any hashing or fetching.

I don’t think I fully grasp all the privacy concerns, but the OP says a server could fetch its portion of the partial collection to sync, so it seems that list is already exposed to the receiving server. You wouldn’t want each recipient on the server to be aware of each other, I suppose, but for that you could use an existing AP concept like bto.

In other words:

When delivering to a sharedInbox, the sending server must expand the recipients of any collection it owns, filtered to just those who reside on the receiving server, and add them to the bto field
(Already covvered by B.11)Receiving server must not display the bto contents to recipients

cjs · December 19, 2020, 10:27pm

This results in different peer servers receiving different versions of the same activity. That’s not impossible, I know of some users have their avatar change depending which peer server is fetching their actor profile, and while that is funny it is generally unsustainable to reason about when mixed with delivery and authoritative IRIs. My server at foo.example.com/sharedInbox receiving a bar.example.com/activity/1 won’t be the same Activity as you get on baz.example.com/sharedInbox. If we defer to the authoritative document being served at bar.example.com/activity/1, which presumably just lists followers as the recipients, we have no reason to trust our local (specific-recipient) copy, and so we’re back at the same problem: does mine and your server’s idea of what followers are match the authoritative servers’?

Only directly delivering messages to actors’ inboxes works with bto and bcc semantics currently. bto doesn’t work with sharedInbox delivery at all. How do you imagine bto working with sharedInbox, since it is stripped? My gut feel is that for every message received on sharedInbox, the receiving server would have to ask the sending peer “Hey, are there any bto I need to be aware of, and who are they?”. Which defeats the purpose of both stripping bto before delivery and avoiding extra network activity back to the sending peer server. Since this problem sits firmly in the world of sharedInbox, that doesn’t seem like a promising solution space.

Of course, one could say “don’t do sharedInbox with bto in this solution”, in which case my proposal is: why not just go one step further and don’t use sharedInbox at all to avoid this problem altogether in the first place. Instead, developing a different solution towards optimizing delivery over the network while preserving the actor model would be a better use of effort, IMO.

datatitian · December 20, 2020, 3:01am

I thought it was obvious, but okay: the delivering server would include in the transmission the contents of the bto field that it was using specifically to direct shared inbox delivery.

That’s not true if you’re clever about how you do it. With the bto option, of course, the activity would appear identical to any viewer on any server due to B.11. With your original example, you’d just include the orderedItems in the collection object. The object would be canonically identical since its defined by the collection id, and, while the collection members would differ by server, that’s the problem that got us here in the first place - so no change.

cjs · December 20, 2020, 10:19am

Hey, thanks for this discussion. This is me responding pre-morning coffee so appreciate overlooking any poor grammar.

I have 4 objections to the regime you describe that are I think are worth acknowledging or addressing:

1. Fundamentally changing bto semantics

This is an RDF-style of objection, and I’m not an RDF-style person, so please bear with me as I attempt to (woefully, without the right vocabulary) make myself clear.

The different delivery effects of a bto field depending on the delivery method (SharedInbox is no longer stripped) results in different data outcomes: bto information is now on a peer server. While bto's textual/layman definition may still be conceptually adhered to (so users may or may not be surprised), it is now no longer adhering to the technical definition. For example, your cited B11 begins with the 9 words "bto and bcc already must be removed for delivery", so that section is completely obsolete. I get the hint that you already have some idea how it should be rewritten given you keep referencing it, but again it is not obvious to me (and I am eager for clarifications, I hope I’m not coming across as willfully obtuse).

For example, consider what would happen if a user, in their UI, added a bto recipient, and then your machine added the bto fields as part of your solution. There is not a good way to determine the "old-semantics user-added bto" versus the "new-semantics machine-added bto" data. With all these considerations, it feels like this solution is hijacking a special property for its function, and not its semantics, for a whole new purpose.

To put this objection into an easy litmus test: “Instead of using bto, could we just as easily create a new field with the desired delivery semantics?” and I have a gut feeling that it will be hard to justify why bto is specifically necessary, instead of just using a new field with the desired delivery semantics.

2. A Separation of Delivery Algorithms (Direct vs SharedInbox)

Trying to manage the delivery algorithm itself is already tough. When you make the bto semantics conditional on the delivery method ("only in SharedInbox delivery are you allowed to not-strip bto"), that is fundamentally changing the underlying delivery algorithm. As you mentioned B11, it itself cites the spec " The server MUST remove the bto and/or bcc properties, if they exist, from the ActivityStreams object before delivery" which is unequivocal: they have one meaning and it is not conditioned on any delivery method. So the spec as it is now neither has too much of a technical nor bto-semantics separation between Direct vs SharedInbox delivery.

Changing the delivery algorithm isn’t a negative alone, but considering there are other solutions which don’t invoke this kind of work to the delivery algorithm itself (which is live and serving traffic), it is by comparison a high-effort and (for those already using SharedInbox delivery) higher-risk-of-bugs kind of an engineering solution in comparison. AKA: while not impossible, the bar is high for these kinds of solutions. And I’m not convinced this line of thinking is on a path towards reaching that bar.

Note: This objection still applies even if one chooses to use a brand-new field with delivery semantics, instead of the bto field specifically. As now that special field only applies in SharedInbox delivery, which results in this objection: a separation of delivery algorithms.

3. Violates B11

I just wanted to make this objection very clear: B11 is no longer applicable for use in this solution, and my objection is for any attempts to re-use it. To paraphrase, you refer to a “clever use of B11” as a way to re-use the bto semantics and address some technical considerations. But that kind of misses the point of why B11 was originally written. It basically states: given that all bto and bcc information resides on the authoritative server and nowhere else, the power is solely and exclusively up to the authoritative server as to how to display the bto and bcc information to its end-users, and the spec-authors’ intentions are that “the authoritative server should only display it to the original author”.

So B11 ensures that the spec authors’ wishes can easily be granted because it is up to only 1 software author (the authoritative server software author) to make it so, so the software implementor themselves can have that confidence. Fundamentally, if multiple software has bto and bcc data, then their wishes cannot be granted and the software implementor themselves cannot have that confidence because it relies on the goodwill of their peers.

To put it another way with formal language: since the solution now violates the antecedent (given […]) part, the consequent no longer holds: peer servers now have > 0 bto information, and therefore the authoritative server is no longer the sole holder of such information, and therefore is not the sole and exclusive holder of power over the bto and bcc data.

Hence, my objection that the entire paragraph needs to be re-thought from the ground up, and I hope it is clear why I think simply saying “see B11” is insufficient in this case.

4. Does not generally solve the problem for all collections

Hopefully I do not need to elaborate on why messing with bto semantics results in only some collections being shared across SharedInbox are able to be synchronized (notably, followers) and is not a generic solution for any collection.

Thanks for the great discussion, and letting me take note of my objections!

grishka · May 13, 2021, 4:22pm

How does this work for synchronizing multiple collections simultaneously? Do I send multiple Collection-Synchronization headers in a single request? Or do I put them into a single header with some separator?

Claire · May 18, 2021, 11:33am

That’s a good question I did not anticipate. The short reply is that the current Mastodon implementation won’t handle that case, at all.

Multiple Collection-Synchronization headers do not seem like a good idea, as afaik multiple headers are supposed to be equivalent to a single header with the values concatenated with a comma, which given the chosen syntax would make the value ambiguous.

nightpool · May 24, 2021, 12:49pm

Maybe in a future revision, we should explore using something like Structured-Fields for the HTTP header, to make it easier to serialize. I know HTTP-Signatures is also moving in this direction: Signing HTTP Messages

Claire · January 24, 2022, 4:15pm

I’d like this FEP to be finalized, as requested at #11 - [TRACKING] FEP-8fcf: Followers collection synchronization across servers - Fediverse-Enhancement-Proposals - Gitea: Git with a cup of tea

The code has been included in quite a few Mastodon releases now, starting with 3.3.0, first published in December 2020. To my knowledge, the FEP fits its purpose just fine and so does the Mastodon implementation, although I do not know of any other fediverse software implementing it.

weex · February 7, 2022, 6:20pm

Submitted a PR to finalize this today. #9 - finalize fep-8fcf - fep - Codeberg.org

weex · February 7, 2022, 7:49pm

PR has been merged. This FEP is now FINAL!