Newbie question: posts & profiles – how to fix the disconnect between local and remote data?

On Mastodon, the disconnect between local and remote data is currently one of its biggest weaknesses (especially compared to ATproto):

  • Posts usually don’t show all replies. Like & boost counts also tend to be lower.
  • Profiles only show subsets of followers and followees. And to see all posts, you have to go to the original server.

Questions:

  • Is there a fundamental way in which this can be solved? Or a series of measures? Or is it something we (mostly) have to put up with?
  • Maybe a timestamp for posts and profiles and if that timestamp is out of date, the local server pulls in all changes from the remote server (hosting the post or profile) that it currently isn’t aware of?
    • Downside: That would considerably increase the load on servers, especially for popular posts. That downside could be somewhat mitigated by only checking the timestamp when someone visits a post or a profile.

Update:

  • One possible solution (source): resolvable context collections (FEP 7888)

Well… there’s a way, but it would go against the architectural assumptions of most software.

The problem is that only one server is authoritative: the origin. Everyone is fetching and caching from that origin, or getting deliveries from that origin and storing information. But at the same time, you can’t trust the origin. Everything the origin produces is a claim that ought to be verified.

On a basic level, you might trust that the origin is correct about its own content for an object. But when it comes to likes and shares, the “total items” count is likewise only a claim. Are you going to count them yourself? Are you even aware of the existence of all Like and Announce activities? So the existing software has to make a distinction between “the origin claims this post has 12 likes, and i don’t trust that” versus “i am locally aware of only 3 likes, and i trust myself”.

What the origin might be able to do is point to a complete list of relevant Like activities, and if you have a way to obtain a trusted representation of those activities (each against their respective origin), then you can validate and count the likes yourself.

This overall lack of trust in the network underpins the use of federation as a mechanism to distribute activities to other servers, and then those other servers can do what they want with that information. But you generally can’t just blanket send to every single known server. That would be incredibly inefficient.

At the very least, you might be able to trust servers to maintain the “special collections” — likes contains Like activities, shares contains Announce activities. It gets a little more complicated for other “special collections” because followers doesn’t contain the Follow or Accept Follow activities. Likewise, we have replies and context which aren’t formally defined in ActivityPub, but could be collections containing… something. fep/fep/7458/fep-7458.md at main - fediverse/fep - Codeberg.org describes the replies collection generally containing (but not limited to) objects with the same inReplyTo, and fep/fep/7888/fep-7888.md at main - fediverse/fep - Codeberg.org describes the context property which could be a collection generally containing (but not limited to) objects with the same context.

For the purposes of gathering related information, a browser or user agent (which is what most “instances” operate as) could browse these collections, then fetch each of the related objects and process them as transformed entities — “posts”, “profiles”, “favorites”, “boosts”, whatever. But again, this is generally a significant increase in network traffic. And depending on other considerations, they might not be able to trust that information: in particular, followers and following cannot be trusted without knowledge of an Accept Follow that hasn’t been revoked (and revocation of an accepted follower is something that isn’t formally solved, either — you could formulate Undo Accept Follow, Reject Follow, Remove targeting a followers collection, etc).

So in conclusion, you’d need either more trust (which could be misplaced) or more distribution (which is expensive for the network).

1 Like

This problem is solved in federated forums that implement FEP-1b12 (such as Lemmy) and in Streams which implements Conversation Containers. These services keep data synchronized between servers (including reactions and boosts) by making conversation owner (a group or a person who started the conversation) distribute related activities to all participants.

1 Like