FEP-8fcf: Followers collection synchronization across servers

Yes, but those properties are only inlined into activities that are private. If those activities get leaked by other servers, you have a much bigger problem: private conversations being leaked.

You’re basically saying: “If non-Mastodon servers don’t understand the non-AP Mastodon-specific private-and-followers-only concept, they’re the ones broken”.

I am saying as a writer of non-Mastodon software that doesn’t have the concept of “private-and-followers-only” that it’s not a reasonable assumption to make. Just because I included followers and omitted as:public does not mean it is private (privacy is not a binary concept), and I don’t want to be accidentally leaking security fields because of a Mastodon-specific interpretation, and then be told my software is the problematic one.

It is true that ActivityPub doesn’t explicitly define anything about who is allowed to see an object, beyond its initial addressing. The only thing it says is that objects addressed to as:Public should be accessible to everyone without authentication. But I’m not sure what you’re getting at. If you consider that software would be right to treat anything it sees as public because AS doesn’t explicitly specify that something should not be displayed to someone it isn’t addressed to… then it means you shouldn’t use ActivityPub for anything private at all ever, regardless of the addressing.

If a post doesn’t have as:public, it shouldn’t be visible to anyone outside of it’s authorized recipients. That seems like it’s pretty clear from the spec? It talks about this restriction in a couple of different locations. (but never in the context of the receiving server in a s2s system because AP doesn’t specify that kind of “lookup” interface for the receiving server)

there aren’t any rules preventing you from Announcing that post to a different audience or whatever but it seems like a mistake to inline the full object without being aware of the individual properties, I would expect nearly all system to just reference the post by id or inline a couple of properties that they know are relevant

Perhaps inverting my argument will make it clearer: having this synchronization option only be available for a private Activity is problematic; a synchronization solution should work independently of visibility/privacy-model/addressing of the Activity it is attached-to/associated-with (including publicly-available ones).

The fact that the security flaw’s best defense is by appealing to the specificities of this particular solution’s coupling of these concerns, which should be separated concerns, will not convince me of it as “solving” or “not being” a security problem as a matter of software engineering principles, especially as a matter of principle.

What is “private”? An account that automatically approves its follower requests (I know this too is optional in Mastodon but it is not encoded into the Activities you send out and therefore is not documented as part of the delivery scope) and sends a non-as:public message to potentially thousands of followers does not mean it is “private”, to me. You may think differently. That’s how Mastodon operates today. I believe it is deceiving to tell users it is a “private” message. You’re right, perhaps I’m more paranoid than the average user, but that’s survivorship bias (I treat all my Mastodon content – including DMs and follower-only – as fully public; people who can’t accept that don’t use Mastodon, which skews your average to those that accept it).

Correct, which is not the same as the message being “private”.

Do we really want to go down this privacy discussion road? I’d prefer to just agree to disagree and save everyone’s time, rather than have y’all try to convince me I am wrong. I’m not trying to convince either of you that you’re wrong. It’s just a different interpretation.

I’m just trying to say “here is a security problem Chris identified” and our viewpoint is rejected. I can see your viewpoint, I would just prefer a no-bullshit answer of “we’re not going to address that” than y’all trying to force y’alls particular privacy model on everyone else (“We introduce this generalized AP solution to synchronosity but it requires Mastodon’s interpretation of privacy to address security concerns”). My interpretation of the spec may be stricter, but I’m not going to force y’all into my world view either. Woo, peering & federation! Unfortunately, this non-reciprocal listening of viewpoints is where I am growing a current source of frustration. :frowning: I am sorry if I come across as upset, it is just that the optional AP leeway of having different privacy guarantees in AP is something I would like to effectively preserve in the ecosystem. Mastodon should want that too, or else a different AP-based Fediverse could be made that is fundamentally un-interoperable.

1 Like

So, I’m sad we still haven’t managed to reach an agreement on such a protocol, especially since, due to issues on our (Mastodon’s) end, it is sorely needed.

To sum up the disagreements with my proposal, I think there are basically two of them:

  • it entrenches the domain-based thing further. I understand this (this goes slightly beyond the existing security considerations, and assumes that software managing one account on a domain manages all accounts on that same domain), and I regret it, but I’m afraid the alternative is too difficult and error-prone to handle. In the extremely unlikely event one handle dispatches accounts to several different software implementations, we can allow disabling that feature, or enabling the implementations to collaborate.
  • privacy concerns regarding implementations that would store/relay “private” activities to platforms/actors that don’t need to know that information. The “fix” for that being to use an HTTP header instead of additional AS attributes. I think that’s also an unlikely edge case, and I’m not happy with pushing extensibility outside of ActivityStreams and using an HTTP header instead, but I have no fundamental opposition to that.

I can work on changing the proposal/PR to use an HTTP header instead of custom AS attributes. Would that work for you?

I have now added a commit to the PR to replace the collectionSynchronization attribute in activities with the Collection-Synchronization HTTP header, with the same grammar as the Signature header and the following fields:

  • collectionId: the collection this header refers to (the only one supported at that point is the sender’s collection)
  • digest: hexadecimal representation of XORed digests of the actor ids on the receiver’s instance, joined by newlines
  • url: the URL of the endpoint returning a Collection

In the current implementation of the PR, the list of followers is the list of followers whose actor id shares the same URI scheme and netloc as the inbox being delivered to. I’m afraid this introduces, again, a new constraint on the inbox and actor id needing to be on the same domain, and I’m not too sure what to do about that.

I am sorry. :frowning: I do want to thank you for enabling this discussion in the first place. I have learned a lot even when the discussions get tough. I know you put a lot of effort into the PR, and didn’t have to come here in the first place, and didn’t have to come back to revisit it. I am also sorry to @nightpool – I know we often see things differently, and I genuinely appreciate your persistent willingness to push back against me.

I’m not intending to block y’all doing what you need to do to fix the problems you’re seeing day-to-day. I just wanted to be sure objections were heard, they were recognized, even if not addressed. I am sorry that I got a bit upset around this, I do think y’all recognized them but in the moment it didn’t feel that way.

In that spirit, I don’t think I have any new objections, and don’t feel the need to revisit old ones. Feel empowered to carry on as you see fit.

Whichever course you choose (submitting the PR as-is with the HTTP header, reverting back to the original proposal, or something else), I hope I haven’t soured the thread so badly that y’all aren’t considering trying out a FEP. Since you’re breaking new ground it would be nice if it existed as a FEP so others could use your solution in an interoperable way.

1 Like

It’s fine. I’m in the process of writing a FEP regarding this extension. Hopefully I’ll have that ready soon.

1 Like

Just an update that it has been implemented in the development version of Mastodon and has been formalized as FEP-8fc1f0: Followers collection synchronization across servers

1 Like

Thinking about performances, I wonder whether we should introduce a mechanism of hash-tree syncing for large collections of followers ?

I mean, if the sending account has 20k followers, is the receiving instance supposed to fetch it all again (or even a subset of 10k for a large instance), while only a single follower mismatches ?

I guess this could be a separate generic proposal in order to optimize synchronizing of any collection, as an alternative to collection paging

BTW: Do you have any statistics about the topology of the Fediverse and the performance of ActivityPub ? Like :

  • The distribution of number of followers (average, max, percentiles, …)
  • How much followers tend to be spread among several instances ?
  • Number of AP requests for each instance, type of requests that consume most resources (CPU, bandwidth, etc)

I’m not worried about performances wrt. detecting issues (which should remain a rare occurrence), but indeed the current proposal is not very efficient for fixing them when large sets of followers are involved, as the receiving instance will fetch the whole list (filtered so it only contains followers on the receiving instance).

Unfortunately, I do not have any statistics about the topology of the Fediverse.

Hello,

I created a new topic to host this discussion so it provides:

  1. a direct link to the repository entry
  2. a clean first post to host the future final version

I described the whole thing in About the Fediverse Enhancement Proposals.

I’m just at the stage of implementing shared inbox delivery in activitypub-express, so this is a great discussion for me to follow.

Your option of expanding the collection in addressing really appeals to me. Could we not take it a step further and expand it to include the items, i.e. the specific recipients on the server being delivered to? Then there would be no need to any hashing or fetching.

I don’t think I fully grasp all the privacy concerns, but the OP says a server could fetch its portion of the partial collection to sync, so it seems that list is already exposed to the receiving server. You wouldn’t want each recipient on the server to be aware of each other, I suppose, but for that you could use an existing AP concept like bto.

In other words:

  1. When delivering to a sharedInbox, the sending server must expand the recipients of any collection it owns, filtered to just those who reside on the receiving server, and add them to the bto field
  2. (Already covvered by B.11)Receiving server must not display the bto contents to recipients

This results in different peer servers receiving different versions of the same activity. That’s not impossible, I know of some users have their avatar change depending which peer server is fetching their actor profile, and while that is funny it is generally unsustainable to reason about when mixed with delivery and authoritative IRIs. My server at foo.example.com/sharedInbox receiving a bar.example.com/activity/1 won’t be the same Activity as you get on baz.example.com/sharedInbox. If we defer to the authoritative document being served at bar.example.com/activity/1, which presumably just lists followers as the recipients, we have no reason to trust our local (specific-recipient) copy, and so we’re back at the same problem: does mine and your server’s idea of what followers are match the authoritative servers’?

Only directly delivering messages to actors’ inboxes works with bto and bcc semantics currently. bto doesn’t work with sharedInbox delivery at all. How do you imagine bto working with sharedInbox, since it is stripped? My gut feel is that for every message received on sharedInbox, the receiving server would have to ask the sending peer “Hey, are there any bto I need to be aware of, and who are they?”. Which defeats the purpose of both stripping bto before delivery and avoiding extra network activity back to the sending peer server. Since this problem sits firmly in the world of sharedInbox, that doesn’t seem like a promising solution space.

Of course, one could say “don’t do sharedInbox with bto in this solution”, in which case my proposal is: why not just go one step further and don’t use sharedInbox at all to avoid this problem altogether in the first place. Instead, developing a different solution towards optimizing delivery over the network while preserving the actor model would be a better use of effort, IMO.

1 Like

I thought it was obvious, but okay: the delivering server would include in the transmission the contents of the bto field that it was using specifically to direct shared inbox delivery.

That’s not true if you’re clever about how you do it. With the bto option, of course, the activity would appear identical to any viewer on any server due to B.11. With your original example, you’d just include the orderedItems in the collection object. The object would be canonically identical since its defined by the collection id, and, while the collection members would differ by server, that’s the problem that got us here in the first place - so no change.

Hey, thanks for this discussion. This is me responding pre-morning coffee so appreciate overlooking any poor grammar.

I have 4 objections to the regime you describe that are I think are worth acknowledging or addressing:

1. Fundamentally changing bto semantics

This is an RDF-style of objection, and I’m not an RDF-style person, so please bear with me as I attempt to (woefully, without the right vocabulary) make myself clear.

The different delivery effects of a bto field depending on the delivery method (SharedInbox is no longer stripped) results in different data outcomes: bto information is now on a peer server. While bto's textual/layman definition may still be conceptually adhered to (so users may or may not be surprised), it is now no longer adhering to the technical definition. For example, your cited B11 begins with the 9 words "bto and bcc already must be removed for delivery", so that section is completely obsolete. I get the hint that you already have some idea how it should be rewritten given you keep referencing it, but again it is not obvious to me (and I am eager for clarifications, I hope I’m not coming across as willfully obtuse).

For example, consider what would happen if a user, in their UI, added a bto recipient, and then your machine added the bto fields as part of your solution. There is not a good way to determine the "old-semantics user-added bto" versus the "new-semantics machine-added bto" data. With all these considerations, it feels like this solution is hijacking a special property for its function, and not its semantics, for a whole new purpose.

To put this objection into an easy litmus test: “Instead of using bto, could we just as easily create a new field with the desired delivery semantics?” and I have a gut feeling that it will be hard to justify why bto is specifically necessary, instead of just using a new field with the desired delivery semantics.

2. A Separation of Delivery Algorithms (Direct vs SharedInbox)

Trying to manage the delivery algorithm itself is already tough. When you make the bto semantics conditional on the delivery method ("only in SharedInbox delivery are you allowed to not-strip bto"), that is fundamentally changing the underlying delivery algorithm. As you mentioned B11, it itself cites the spec " The server MUST remove the bto and/or bcc properties, if they exist, from the ActivityStreams object before delivery" which is unequivocal: they have one meaning and it is not conditioned on any delivery method. So the spec as it is now neither has too much of a technical nor bto-semantics separation between Direct vs SharedInbox delivery.

Changing the delivery algorithm isn’t a negative alone, but considering there are other solutions which don’t invoke this kind of work to the delivery algorithm itself (which is live and serving traffic), it is by comparison a high-effort and (for those already using SharedInbox delivery) higher-risk-of-bugs kind of an engineering solution in comparison. AKA: while not impossible, the bar is high for these kinds of solutions. And I’m not convinced this line of thinking is on a path towards reaching that bar.

Note: This objection still applies even if one chooses to use a brand-new field with delivery semantics, instead of the bto field specifically. As now that special field only applies in SharedInbox delivery, which results in this objection: a separation of delivery algorithms.

3. Violates B11

I just wanted to make this objection very clear: B11 is no longer applicable for use in this solution, and my objection is for any attempts to re-use it. To paraphrase, you refer to a “clever use of B11” as a way to re-use the bto semantics and address some technical considerations. But that kind of misses the point of why B11 was originally written. It basically states: given that all bto and bcc information resides on the authoritative server and nowhere else, the power is solely and exclusively up to the authoritative server as to how to display the bto and bcc information to its end-users, and the spec-authors’ intentions are that “the authoritative server should only display it to the original author”.

So B11 ensures that the spec authors’ wishes can easily be granted because it is up to only 1 software author (the authoritative server software author) to make it so, so the software implementor themselves can have that confidence. Fundamentally, if multiple software has bto and bcc data, then their wishes cannot be granted and the software implementor themselves cannot have that confidence because it relies on the goodwill of their peers.

To put it another way with formal language: since the solution now violates the antecedent (given […]) part, the consequent no longer holds: peer servers now have > 0 bto information, and therefore the authoritative server is no longer the sole holder of such information, and therefore is not the sole and exclusive holder of power over the bto and bcc data.

Hence, my objection that the entire paragraph needs to be re-thought from the ground up, and I hope it is clear why I think simply saying “see B11” is insufficient in this case.

4. Does not generally solve the problem for all collections

Hopefully I do not need to elaborate on why messing with bto semantics results in only some collections being shared across SharedInbox are able to be synchronized (notably, followers) and is not a generic solution for any collection.


Thanks for the great discussion, and letting me take note of my objections!

How does this work for synchronizing multiple collections simultaneously? Do I send multiple Collection-Synchronization headers in a single request? Or do I put them into a single header with some separator?

That’s a good question I did not anticipate. The short reply is that the current Mastodon implementation won’t handle that case, at all.

Multiple Collection-Synchronization headers do not seem like a good idea, as afaik multiple headers are supposed to be equivalent to a single header with the values concatenated with a comma, which given the chosen syntax would make the value ambiguous.

Maybe in a future revision, we should explore using something like Structured-Fields for the HTTP header, to make it easier to serialize. I know HTTP-Signatures is also moving in this direction: Signing HTTP Messages