Anonymous Likes/Dislikes or other activities via proxy user

SorteKanin · August 20, 2024, 3:19pm

There’s been some discussion in the lemmy.world fediverse community about the privacy (or lack thereof) of votes.

Piefed is now experimenting with a solution where votes are federated via proxy users, i.e. not federated directly from the actual user who voted. You can read some details here.

I’d like to hear from a general ActivityPub standpoint: What implications might this have? Clearly this is a workaround and not a true solution, but is it a good workaround or does it have bad edges? I’d love to hear what you think.

oatbiscuits · August 20, 2024, 4:42pm

I feel it is prudent to lay out the concerns first, before evaluating possible solution(s):

Automated voting manipulation.
Undesired voting score influence by “outer” groups. (i.e: astroturfing, brigading, etc)
Undesired publishing of vote issuer (individual privacy concerns i.e: witch-hunts and undesirable social consequences based on votes.)
(Non-consensual) mass collection of data and profiling.

tesaguri · August 20, 2024, 11:24pm

It’s not a solution, but from a general ActivityPub standpoint, I think the audience targeting (to, cc, bto and bcc properties) of Like activities should represent the privacy of the likes, in the same way as those of Create activities, and to unconditionally expose every Like is a (unfortunately common) misimplementation.

trwnh · August 21, 2024, 12:27am

So there’s nothing really in the way of having a single actor send multiple Like activities, spec-wise – the one exception is that the optional liked collection of the actor is idempotent. The likes collection of the object is not necessarily guaranteed to have at most one Like per actor. You could therefore send/publish multiple Like activities, each with a different id, from the same actor. This will be unexpected behavior by some implementations that enforce a uniqueness check on the likes collection by actor.

SorteKanin · August 21, 2024, 8:26am

I think there’s a big difference when it comes to Like activities and Create activities with regard to their audience.

For instance, a Create activity that is only addressed to Alice@alice.com and Bob@bob.com need only be sent to alice.com and bob.com. That also means that you only need to trust those instances to keep the activity private (i.e. only Alice and Bob can see them on those instances). That’s fine because Alice and Bob clearly already trust this instance, otherwise they wouldn’t have a user on that instance.

However, a Like/Dislike activity needs to be broadcast to basically all instances that are aware of the post in question, if you want an accurate vote count. The desire here is not to have the Like count be private (the count should be visible to all), but to keep it private who exactly voted. You can’t do that if you don’t broadcast out the Likes to all the different instances. But you can’t trust all those instances to keep the actors that Liked hidden - it only takes one of them to make the votes public for them to be essentially public.

I’m not sure how you see using the to, cc, bto and bcc properties in order to keep the actor behind the Like private while still keeping the total count of Like/Dislike public.

The desire here is not to send multiple likes from one actor - PieFed ensures that a user still only votes once on each post (although of course it is not possible for an external instance to verify this). The desire is simply to not reveal who votes on what while still keeping the vote count accurate.

trwnh · August 21, 2024, 2:38pm

Well, if you don’t address the Like to/cc as:Public, then the recipient shouldn’t be showing that Like in public. A count is fine (if unverifiable).

And having a single proxy actor send multiple likes is just one way of doing it — internally, you can do whatever you want, but over federation, you need an actor on the Like. I don’t see a need to create one proxy actor per real user, is all.

SorteKanin · August 21, 2024, 3:12pm

In practice though, there is no way to enforce this, unless you want to manually vet all the (hundreds, potentially thousands of) instances you federate with. Any new instance could also display all the incoming Likes regardless of audience, so you’d need to be constantly vigilant or use an allowlist. How do you protect against that? It doesn’t seem like ActivityPub provides any method aside from “just trust that other instances respect the audience fields” which I feel isn’t good enough.

trwnh · August 21, 2024, 3:52pm

Yeah, the same issue is present for any other “private” activity. It’s also present for other protocols – who’s to say that anyone to whom you send an email will not republish that email?

SorteKanin · August 21, 2024, 4:04pm

I disagree. I don’t think it is comparable to email or private notes shared between few actors in ActivityPub.

With email, you are sending a message from your email instance (an instance you trust) directly to the recipient’s instance (an instance the recipient trusts). In this scenario, it is only necessary to trust these two instances and they are easy to trust because you (sender and receiver) already trust them with your email and facilitating your account.

The same goes for ActivityPub private messaging between two actors - the activity is only sent from the sender’s instance to the receiver’s instance. Both of these instances are effectively trusted already.

But a Like activity on a public post that needs to be sent out to practically all instances that your instance knows of - it is much harder to trust that that will stay private because a whole lot of “third-party” instances are involved. It is not just your instance and the instance of the post you Liked.

I don’t think this could be called “the same issue”. That would be like if I sent an email to all email servers in existence, but then I also attached a note at the end saying “Please don’t share that it was me who sent this email, but include it in your statistics for how many emails you received”. I cannot reasonably expect that request to be upheld by everyone.

trwnh · August 21, 2024, 4:34pm

There’s no difference between a Like activity and any other activity.

I wouldn’t say that “a Like activity on a public post […] needs to be sent out to practically all instances that your instance knows of” – existing implementations can and do send Likes to a more limited audience. Mastodon sends Likes only to the author of the liked post, and you could also send Likes to followers but not to as:Public.

Along the same lines, you could send a followers-only post or a DM to someone, and if they don’t understand that addressing, then they might interpret the post wrongly as a “public post” – and Wildebeest did this for quite a while, they just assumed all posts were public.

You have to trust the recipient server in any case where you deliver something to one of their inboxes. There’s no sidestepping that.

SorteKanin · August 21, 2024, 5:21pm

But there is a difference between an activity sent to a few actors and one sent to a lot of actors, I would say.

As for your suggestions, none of those work for a community/forum-based social media (anything that tries to emulate Reddit essentially), such as Lemmy, Mbin or PieFed. Those suggestions may work for microblogging, but it is not a good solution for content aggregation or mass-voting akin to what you see on Reddit.

For a feed like that, you need Likes to be broadcast out across basically the whole network, otherwise small instances would not be able to properly sort posts (on that note, how do small Mastodon instances show the most liked posts or even an accurate like count if only the author receives the Likes?).

trwnh · August 21, 2024, 5:59pm

The only difference is how many actors you send it to. You have to trust each and every recipient. This is a constant even outside of the issue of “anonymous likes”.

At the protocol level, no such distinction exists. The only primitives you have are Web resources and delivering activities. You are free to send your Like activity to as many or as few recipients as you wish. Again, you should generally trust those recipients before sending them activities.

That’s the thing – they don’t! Any like count can be faked. Mastodon’s philosophy is to show only what it can verify, and “what it can verify” usually amounts to local likes only + whatever it explicitly receives, provided that it relates to some local object – Like activities for remote objects are discarded as they are considered irrelevant. The relevant code is here: mastodon/app/lib/activitypub/activity/like.rb at 2da687a28b509025343d3d8ca17753de9b128e8f · mastodon/mastodon · GitHub

return if original_status.nil? || !original_status.account.local? || delete_arrived_first?(@json['id']) || @account.favourited?(original_status)

As far as Mastodon is concerned, !original_status.account.local? is excluding any Like activity where the object’s author is not a local user. There is an open request to remove this conditional check: Properly federate "like" objects · Issue #11339 · mastodon/mastodon · GitHub

One other thing:

This is a fool’s errand; it makes far more sense to track such Likes at the origin, via the likes collection. This collection can expose a totalItems count based on whatever the origin is claiming, and items/orderedItems can expose the raw Like activities provided that the fetcher is authorized to see them (i.e., that the fetcher is included in the audience).

nightpool · August 21, 2024, 10:46pm

As @trwnh says, there is no difference between any type of activity on the protocol level. If Lemmy is using the type of the activity to determine who it should / shouldn’t be displayed to, then it needs to stop doing that immediately.

It is not appropriate to assume that Likes are public if they do not have addressing specifying as:Public.

SorteKanin · August 22, 2024, 8:09am

I am well aware that Create and Like are just activities and the protocol doesn’t care in that sense. But to say that a private Note sent between very few Actors has just the same privacy implications as a Like broadcast widely to basically all instances - that seems very reductionist and not very pragmatic to me.

I don’t know the inner workings of Lemmy in detail, but let’s give Lemmy and similar apps the benefit of the doubt - let’s assume they only display Likes publicly when as:Public is in the audience (while still showing a total count of all votes, private or public). Even in that case, can you trust that all instances you send Likes to run an unmodified Lemmy that respects as:Public or lack thereof? Also, some instances obviously run different software than Lemmy - can you trust those to respect this too?

I get that in principle all instances should respect this, but there is no mechanism in ActivityPub to enforce it. It seems easy for a malicious instance to gather Likes and make them public for all to see, regardless of as:Public. Sure, you can defederate such an instance once you find out, but then the cat is out of the bag and all votes collected while the instance was up are still leaked and thus public. Not to mention that these Likes are obviously always public for instance admins, who can just examine the activities that come in. So in that sense you can never Like something privately, admins of other instances will always know.

It really seems to me that there is some kind of enforcement missing in ActivityPub around this use case. The mechanism that PieFed is now experimenting with is a bit of a hacky workaround to enforce that privacy - but is there a better way? Are there problems with this approach? More generally, should ActivityPub strive to support this use case better?

Doesn’t this lead to a much worse experience for any small Mastodon instances? I mean a small instance would basically not see any Likes in that case while larger instances would, no? Or am I misunderstanding something? Doesn’t this encourage centralization in large instances? That feels like the opposite of what you want to encourage on the fediverse, and it doesn’t sound like a good UX to me either. Also surely the like count can’t be “faked” if you receive the Likes directly from the instance that liked a post?

Why is this a “fool’s errand”? It is at least what Lemmy, PieFed and Mbin do AFAIK. Are you saying that the Lemmy, PieFed and Mbin developers are fools? I assume not, but your language is a bit harsh here. I haven’t studied their code in detail so maybe I’m misunderstanding something but I don’t see how else it would work.

How else would you obtain an accurate Like count across instances? It sounds like you are suggesting that an instance should pull the Likes collection from the instance with the post in question, but that doesn’t seem scalable - when would you stop pulling? You can’t just keep updating Likes by pulling from the origin for all old posts forever.

You need some kind of push mechanism - does ActivityPub provide any way to push updates to collections for external posts? I.e. when the origin instance receives a Like, it could broadcast that Like (or updated Like count) out to all instances so they could update their counts. But that isn’t much different from sending the Like directly from the liking instance (in fact, it puts more load on the instance with the Liked post, which probably isn’t what you want).

I’m very curious how you suggest Lemmy, PieFed, Mbin and other similar Reddit-style apps should otherwise attain an accurate vote count across instances, if they are not to broadcast the votes out to all instances. Is there any easier way?

bumblefudge · August 22, 2024, 4:21pm

I am a little confused reading this-- if a proxy-voter actor exists on each server that also hosts the actual-human who controls it, and the server effectively keeps those secret (and ideally obfuscates it to a certain acceptable threshold, i.e. not publicly/obviously deleting the proxy-voter the day the account gets deleted, etc etc), AND likes are collected on notes controlled by GROUPS rather than by individuals, couldn’t this work?

On the original thread I noticed a trust/reputation question (how to detect and ignore/drop malicious, ill-behaved, or inauthentic voter-proxies by age, posting history, up/down ratio, etc.) but I think this stuff has to be baked into the same FEP before being harmonized on, because it creates all kinds of corner cases and privacy gaps to have piefed and lemmy using one minimum threshold for counting votes and kbin using a totally different one (i.e. someone could programmatically compare like-counts across the two and partially de-anon accounts that are differentially trusted…)

SorteKanin · August 22, 2024, 7:45pm

It could work - it just has some implications. Like for instance, what if you ban the voting pseudo-user from an external instance? The instance where the real user sits could obviously detect this and know that that user is banned, but from the external instance’s point of view, that user is not banned and there’s no way for the banning instance to discover that (but I mean, maybe that’s good? You shouldn’t know who it is after all).

I’m also curious whether there’s a better mechanism or maybe something that could be in-built more natively in ActivityPub without needing these hacky extra users.

You could imagine taking this to a crazy extreme and generate a new anonymous voting user for every single vote you cast (or use a rotating number of them or something) to make it even more anonymous. But that feels like it’s kinda getting out of hand.

I’m not sure what you’re saying here, can you elaborate?

trwnh · August 22, 2024, 8:09pm

on privacy of activities

What I’m saying is that you could just as well have a private Like and a widely-broadcasted Create Note. There’s nothing inherent to Likes and Notes that says either of them have to be private or public. Every single activity is exactly as private or public as its author decides it to be, based on whom and how many addressees are included. Your responsibility as a recipient of any activity is to not leak that information to a wider audience. This necessitates trust.

There is no mechanism in any protocol to prevent someone from republishing information to a wider audience. The most you can do is add encryption into the mix, which means key management and revocation and all that other “fun” stuff. And even that won’t prevent someone from leaking the information if they really want to. The most you can do is to allow for repudiability and deniability by not attaching a signature to the information. Again, you have to trust the recipient.

ActivityPub explicitly doesn’t mandate any specific mapping between actors and users. It’s perfectly fine to have multiple actors associated with a single user, and it isn’t a hack. Whether you have a single “proxy actor” or whether you maintain one proxy actor per real user, that is up to the implementation.

You could. At the end of the day, an actor is nothing more than an IRI, after all. If an implementation wanted to anonymize every single vote, then they could do that. Or they could do any number of things.

on breadth of distribution

This assumes you follow the “instance” model at all. I don’t think it’s everyone’s responsibility to mirror or replicate the entire social web or its social graph. Yes, smaller “instances” will have a less complete view of the entire graph. What you should do about that is to provide a sync mechanism for any external observer to verify the information that they’re interested in – this is what the likes collection does for Like activities, filtered by your authorization to see any given Like activity. In general, I’d advocate less thinking in terms of “instances” and more thinking in terms of “social web browsers”. The former is a subset of the latter, and you have mechanisms available for staying in sync, such as following actors, polling collections, etc.

It’s a figure of speech: Fool's errand Definition & Meaning - Merriam-Webster

Let me rephrase this bit. Working backwards from the goal of “Likes [need] to be broadcast out across basically the whole network”, you arrive at an inherent assumption that “the whole network” needs to be aware of all this activity… and they very much do not need to be aware of all of it. If you want stronger consistency guarantees, then you need tighter peering between “instances”; think along the terms of IRC networks. This is not the prevailing model of the fediverse. In the current fediverse, we do not have a single global network; we instead have a network-of-networks where each constituent network has its own locality of information. Trying to have a global view of everything that happens on the Internet is what I was referring to as the “fool’s errand”.

bumblefudge · August 22, 2024, 8:11pm

Oh, i just meant that there is a natural coupling that happens between the shadow-voter mechanism and the eligibility requirements for a shadow-vote to be cast, or for that matter, any other assumptions implied by optional/extension features like filtering out votes on the basis of… anything at all. you can’t mix shadow-votes between servers that assume/guarantee X and servers that don’t, basically, so whereas I normally advocate for thin, composable FEPs here it’s kinda monolithic, everyone need to have all the assumptions and guarantees alike to form ONE federation of shadow-votes for it to work smoothly (and not create corner cases that get really hard to fix post-facto)

bumblefudge · August 22, 2024, 8:21pm

^I love this, and it didn’t even cross my mind at all that it could be seen as hacky for servers to pseudonymize users into shadow-voters and to guarantee/preserve the “herd privacy” of that linkage. But my point about a monolithic system is that every assumption, and the assumptions made by every feature you’re expected to work well across that federation, be extremely explicit in one testable FEP

@trwnh makes a great point that it’s still a candidate solution and maybe the exercise of mapping out all the assumptions and features you’d need this to support will actually make one-proxy-per-vote more reasonable. I would recommend thinking through:

users deleting their accounts without deanonymizing their votes (maybe the perfect is the enemy of the done thinking through corner-case deanon with an obsessive and well-resourced snoop)
what happens to the up/down votes on a given object if they’re banned by the object’s host-server, or the originator’s host-server
revoking votes without deanonymizing
trust-establishment, like i mentioned

probably more stuff i’m forgetting but it’s late here and i’m half asleep

SorteKanin · August 22, 2024, 8:36pm

Well obviously not, but you could imagine for instance adding an “AnonymousLike” activity to ActivityPub that would not include any information about what actor actually performed the Like, but just that a Like happened from someone (no encryption required). You can’t republish something you didn’t receive. At the moment, ActivityPub doesn’t provide any anonymous activities in that way. Yes, you can’t prevent someone from leaking what should be private, but you can definitely design a protocol to not even expose the information you need hidden in the first place.

I understand that you are coming at this from a very generalized (I would almost say theoretical) perspective on ActivityPub. However, I’d love to hear a more pragmatic response.

I mean yes, of course any implementation can do whatever they want. That is not what I am questioning. I am asking, in this particular case, with this particular implementation, in the current ActivityPub landscape that actually exists (not theoretical), particularly around the “threadiverse” of Lemmy, Mbin, PieFed and similar ActivityPub-powered Reddit-like social media, what implications does the practice of shadow voting accounts have? Is it a good path for achieving anonymous voting or are there footguns? Could ActivityPub evolve to support this use case better than they currently do? (for instance, the AnonymousLike activity I mentioned above as an example)

I’m not sure if I’m completely understanding what you’re suggesting here, but to me this sounds completely incompatible with Reddit-style social media or link/content aggregators if you will. The idea of content aggregating is to collect content and let people vote on the content so that all the content can be sorted in order to get the most interesting stuff at the top of the feed. I don’t see how you can perform this sorting if you don’t have a pretty wide view of the votes that have occurred across the network.

A syncing mechanism (not even sure what that would look like) sounds to me like it would be too late - you need the votes before the user even looks at their feed, otherwise you don’t know what to display to them. The content aggregator loses all value if you don’t have all (or at least most I guess) of the votes, since you will not be able to surface the most interesting content then.

Does ActivityPub just not support content aggregation social media very well in this way? It just seems sad that for all its flexibility and extensability, the protocol doesn’t seem very suited for replicating something as popular as Reddit.

But as I said above, a content aggregator definitely needs an awareness of all votes that occur. Lemmy doesn’t federate all content - communities (magazines, subforums or subreddits whatever you want to call it) only federate posts once someone from your own instance subscribes to that community. But after that subscription, you will get posts from that community, and then you definitely need all the votes for that community, otherwise you are powerless to sort the content, which is the whole point of the content aggregator.