Anonymous Likes/Dislikes or other activities via proxy user

trwnh · August 23, 2024, 1:41am

In this very limited case, then no, there aren’t really any implications that I’m aware of that are worth noting. Piefed is not doing anything wrong, hacky, or even particularly unusual by having proxy actors. The use of proxy actors is a broadly-implemented design pattern that is used for all manner of things, like flagging/reports, HTTP signing, and so on. Using proxy actors for voting is a natural way to separate the PII of the voter from the actual voting behavior itself. There isn’t really anything that “ActivityPub” needs to do to support this use case – it is already supported. There is no need to define an AnonymousLike when you could simply decide to drop as:Public from your addressing and/or use proxy actors. I suppose you could even just not declare a value for actor, but this is probably going to trip up activity validators.

This aggregating necessarily happens within at least one origin, no? And that origin may or may not be replicated across one or more peers. There doesn’t need to be a single global network. Currently, each “instance” has a localized view of whatever it is aware of. Nothing is stopping instances from peering with each other in a similar way to IRC networks, except for a lack of implementation. This peering could be managed at a “relay” level by having either a central actor that distributes activities to its constituent peers, or by having a many-to-many relationship where every peered “instance” follows some actor from every other “instance”. This allows for as wide a view as you want – want it wider? Just follow more actors!

The syncing would happen before displaying anything to the user, in the same manner that a Web browser must fetch a Web resource before displaying it to the user. This is how any user agent works. What a lot of people don’t realize is that under the “instance” model, your “instance” is your user-agent, and is operating like a Web browser or ActivityPub client – it is not strictly or purely an S2S server. It is a monolithic application that elides out the C2S API in favor of internal mechanisms.

A more practical example: your “instance” could fetch a collection of objects, and then fetch the likes collection for each object to get the totalItems count. Or the authoritative origin could inline-embed a partial representation of the likes collection, like so:

id: <some-object>
likes:
  - id: <some-object/likes>
  - type: OrderedCollection
  - totalItems: 47
  - first: <some-object/likes/page-1>

All of this is inherent to the way the Web works. The data model of ActivityPub is Activity Streams 2.0, which defines activities and objects and collections, which are resources, and those resources are Web resources, and are also described via the Resource Description Framework (RDF), serialized as JSON-LD. ActivityPub uses all these building blocks to provide a pubsub mechanism for being notified of activities that occurred somewhere.

ActivityPub “supports” this use case but is not designed (solely) for this use case. The Activity Vocabulary can certainly be used to describe the primitives that make up a Reddit-like experience – the submission of Links and/or Notes, with some being potentially inReplyTo other objects or threaded/grouped within a context, where objects can have a likes collection and by extension you could similarly define a dislikes collection. Frankly, I’m not sure what shortcomings you see in the protocol here – what about it is not “very suited” for this purpose? To me, it seems like any lack of fitness-for-purpose can be addressed by thinking more generally in terms of Web resources. After all, a Reddit-like content aggregator is just one of the many specific ways you can present the same generic data. There isn’t actually as tangible of a split between “microblogging” and “content aggregation” as one might think – it’s all just the Web. You publish resources and other actors can subscribe to those resources. What you do with those resources after-the-fact is up to you. Browsing the likes collection and calculating some score, then presenting a bunch of Objects or Links in a Collection? That’s just one of the many things you can do.

bumblefudge · August 23, 2024, 7:23am

Just to operationalize a bit, i think an exhaustive/verbose documentation of the user stories enabled would make it a lot easier to debate candidate solutions. So far i’ve heard:

“instance-locked proxy” and
“per-vote proxy”, but maybe there are other possible paths worth exploring like
“regular Like but as:Private” or
“Upvote marked as:PseudonymizedLocally”

Sketching out the goals (including features on top like reputation or filtering out suspicious/weak votes) sounds like a lot of annoying upfront work, but it’s really a stitch in time that makes identifying additional candidate solutions faster and easier. it also makes it much easier, when validating candidates and comparing their pro’s and con’s wholistically, to show all the steps in a hypothesis. It also helps get help from people like me who are anything but power-users of Reddit (i barely use any reddit-shaped thing, my sense of humor gets me downvoted everywhere)

SorteKanin · August 23, 2024, 9:51pm

That’s good. For reporting, is that to keep the reporter anonymous? What apps do this? And with HTTP signing are you referring to the use of instance actors, or is there something else?

I don’t think I understand the mechanism you’re talking about in this paragraph. As it is right now (as I understand it at least), Lemmy and similar apps federate content and votes to each other directly between instances. Each instance then gathers the collected content and their votes data in their own database. This database is then used to sort content and display that content to the users of that specific instance. Is that what you mean with “aggregating within one origin”?

But who would do this? Do you mean that an instance would automatically follow any relevant actors? Or would users on each instance manually do it? Manually doing it on each instance sounds infeasible, as small instances would not have a lot of content or votes to display (which ruins the point of content aggregation).

But as I said before, fetching likes (pulling) is infeasible. Likes obviously are added all the time, for new and old posts. You can’t just keep pulling votes periodically for old posts. Also, sorting posts is not a quick task for the server; it is not really feasible to include a whole lot of network requests in that process. You must have some kind of push mechanism that makes other federated servers aware of the likes without them having to reach out themselves to find out.

I think I’m having a hard time understanding the concrete mechanism you’re proposing here. I’m a software engineer so maybe I’m thinking too practically and implementation-oriented, but I don’t really understand how what you’re saying would be done in practice.

Well, for one, the ActivityPub model is very focused around actors following other actors, which is not really how most content aggregation functions at all. Instead you follow subforums or categories or subreddits or whatever you want to call them. Of course, to work inside ActivityPub, Lemmy and Co. has gotten around this by using Groups to model these categories. However, this is a bit of an unnatural usage I would say - most people would consider a category to be a collection of posts all related to the same topic, not a Group of all the people subscribed to that category (for instance, you do not even need to be subscribed to a category to post in it or anything like that). On that note, is there a mechanism for following a collection, rather than an actor?

To me, ActivityPub seems primarily designed for (micro)blogging, is all I’m saying. Any other category of social media (or at least content aggregation) seems to have to jump through various semantic and technical hoops in order to use it for their use case.

I again think this is a bit reductionist - these things are quite different in how they function both concretely for the user, what kind of social culture it brings and the kind of technical challenges it requires to implement.

Following individual accounts on Mastodon and displaying a chronological timeline requires much different technical challenges than collecting posts and votes from a wide array of sources and sorting them according to votes (does Mastodon have any mechanisms for showing posts in a non-chronological fashion?).

I think we may have strayed a little bit from the original topic of the post but I do find this discussion interesting, if you’ll continue to indulge me.

SorteKanin · August 23, 2024, 10:01pm

What’s the difference between 3 and 4?

I think I don’t like option 2. It makes it too hard to spot mass-downvoting and other kinds of vote abuse. It’s also quite the explosion of actors instead of just 1 extra actor per user.

Of course option 1 doesn’t provide the same level of confidentiality, as you will probably be able to find out who is behind the proxy voter with some statistics on the votes. However, from what I’ve heard, people are mostly worried about others attacking them for their voting behaviour and I’m guesstimating that most such hararssers would not bother too hard trying to figure out who is behind some vote just to call them an idiot or something? Then again, some trolls are very persistent

bumblefudge · August 24, 2024, 6:12am

they were just examples off the top of my head! hard to help until its all written out in a structured document, the convo is way too multithreaded and assumes way too much context otherwise

tesaguri · August 24, 2024, 10:11am

I’m not familiar with the so-called Threadiverse, but I guess it’s not desirable in the Threadiverse to notify even the author of the object about the Like activity, unlike in the blogosphere where the Like activity is typically used as a manner of communication. So targeting the Like activity to the attributedTo actor of the object may not be an option either.

Then, how about setting up a proxy actor (a reverse proxy actor?) on the receivers’ side instead and make the Page objects declare that proxy? I think this keeps an acceptable level of privacy (given that you trust the receiver’s server, but you wouldn’t need to trust anything else. I think that’s needed anyway if you want an “accountable” moderation instead of random machine-learning oracles) and the “reputation” and filtering questions.

For example, if the receiver supports the extension mechanism, the Page object would look like so:

{
  "@context": [
    {
      "bikeshed": "http://example.org/bikeshed#",
      "dislikes": {
        "@id": "bikeshed:dislikes",
        "@type": "@id"
      },
      "voteAggregator": {
        "@id": "bikeshed:voteAggregator",
        "@type": "@id"
      },
    "https://www.w3.org/ns/activitystreams"
  ],
  "id": "https://example.com/post/42",
  "attributedTo": "https://example.com/u/1",
  "audience": "https://example.com/c/1",
  "name": "Hello",
  "content": "Vote me plz",
  "likes": {
    "id": "https://example.com/post/42/likes",
    "type": "Collection",
    "totalItems": 12
  },
  "dislikes": {
    "id": "https://example.com/post/42/dislikes",
    "type": "Collection",
    "totalItems": 123
  },
  "voteAggregator": {
    "id": "https://example.com/aggregator",
    "type": "Application",
    "inbox": "https://example.com/aggregator/inbox",
    "endpoints": {
      "sharedInbox": "https://example.com/inbox"
    }
  }
}

If the voter’s server supports the extension mechanism, a Like activity would look like so:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Like",
  "id": "https://example.social/likes/1234",
  "actor": "https://example.social/users/1",
  "object": "https://example.com/post/42",
  "to": "https://example.com/aggregator"
}

…and the receiver wouldn’t forward the Like to third-party servers as a contract of the extension mechanism (technically, the “contract” shouldn’t be needed, but in practice, we need a mechanism like this to cancel the existing convention).

If the sender doesn’t support the mechanism, it would send the activities in the existing way, and the receiver could process them just fine.

If the sender supports the extension mechanism but the receiver doesn’t, well, the sender’s server might be able to warn the user of the privacy risk at least, or fall back to the (non-reverse?) proxy mechanism (at the potential risk of being assigned zero reputation).

If it’s desired to have a pushing mechanism for notifying the audience about the changed vote count, the receiver’s server could distribute an Update activity for the likes collection. As a bonus, the server could even throttle the frequency of the distribution when the bandwidth is overloaded.

I don’t think pushing alone would be sufficient either, because, that way, a newly joined server has no way of knowing the count of votes made before it participated in the network. There needs to be a pulling mechanism either way, even if that’s not the only mechanism of federating votes.

And if you have a pulling mechanism, the sender’s server would be the source of truth for the vote count anyway (because, how would you pull the counts otherwise?). This makes it less convincing to me to require the senders’ servers (instead of the receiver) to notify other servers about the changed vote count.

SorteKanin · August 24, 2024, 4:53pm

What do you mean notify? As in a notification? No you don’t want that. The Like obviously needs to be somehow communicated to the instance of the author of the Liked object so that it can do a +1 to the vote count, but you definitely don’t want a notification for every single (Dis)Like. A single popular post or comment could get you hundreds of notifications if that was the case haha. But just because the like is targetted via to or something like that, I don’t think that necessarily means that the actor in question should get a notification.

I don’t really see how that changes the privacy situation in any way - you’re still dependant on the remote server respecting the privacy of the Like.

I don’t think this happens currently? Does it? My understanding is that activities like Likes are sent directly from instance A to instance Z - no instance B is forwarding one activity from instance A to instance Z as a “middle-man”.

That is true, but the pulling mechanism would only be necessary for getting information for posts that you have not seen before. You would only use it once at the start and then you’d rely on pushing afterwards.

Not necessarily and this could be used maliciously. You don’t have to trust the sender’s server, you can instead go through the Likes collection and fetch each Like activity in the collection - but of course this requires 1 request per vote instead of just 1 request per Likes collection and there can be a lot of votes so… yea it’s kind of a pain if you need to do that. But in principle you have to do that if you don’t want to trust the origin server’s Like count.

You’d need to filter the Like count from the origin in any case, as some Likes that it counts may be from instances that you have defederated and thus you would not want to count those votes.

In my mind, pulling would only be for “bootstrapping” new posts, afterwards you’d rely on the pushing mechanism.

tesaguri · August 25, 2024, 2:57am

My wording was misleading (it’s poorly mixed up with Linked Data Notification jargon). I was thinking about sending unanonymized Likes to the author of the original post rather than to the server (or possibly moderators of the community) alone.

Also, I was thinking in terms of the protocol rather than UX. At the protocol level, delivering an activity to an actor’s inbox implies the addition of the activity in the inbox collection, which is why I think it’s not good idea to target (unanonymized) Likes to the author of the object. At the UI level, yes, you don’t necessarily want to generate “notification” for every vote. That’s completely up to the implementation.

But in this way, you would only trust the server of the original object, which explicitly opts in the extension mechanism (and its contracts). I thought your concern was that you cannot practically trust arbitrary audience servers to respect the targeting of Like activities, and I think that limiting the recipient of the activity to a single server of the sender’s choice could reasonably enforce the targeting.

If an implementation opts into the extension mechanism and still ignore the targeting of the activities, you could safely defederate the implementation for its malicious behavior, just as you’d defederate an implementation that intentionally expose private replies.

I was conservatively assuming the likes collection and an inbox forwarding-like convention, but if there’s no such convention in the real-world Threadiverse (again, I’m not familiar with the Threadiverse), that’s just fine.

Yes, you’d need to get the Like activities from the voters’ servers if you don’t trust the server of the original object, but what’s the point of that exactly? If the sending users of the Likes are anonymized by the sending servers, the only information the audience servers can get is the “voting patterns” (as the piefed.social post you liked in the OP puts) with no “reputation” information. To me, this seems to rather increase the number of SPoFs that can manipulate the vote count.

You might be able to detect obviously bad voting patterns (like mass-voting in a narrow time window, though note that malicious servers could easily fake published of Like activities), but I expect (well, it’s just a wild guess) many of bad behavior are not that obvious by the voting patterns alone. Perhaps statistical analysis helps with a reasonable accuracy, but ought that to be the only solution to the moderation?

Also, I had an impression that Threadiverse implementations tend to put strong trust in the community, partly because Lemmy trusts the contents of posts forwarded by followed communities without any cryptographic verification (not that I like the idea), and PieFed’s Page objects seemingly embed remote objects in the replies collection (well, its replies seems to be a plain array of Notes instead of a Collection as specified in the Activity Vocabulary, but I digress) without cryptographic proofs. But, again, I’m not familiar with the Threadiverse and I might be misunderstanding the trust model there.

As a random idea, sending an unanonymized Like to the server of the original post and distributing an anonymized Like to the audience might be viable at the same time, by dropping the actor property (inspired by @trwnh’s idea) of the activity when fetched by an unauthorized client. The anonymized activity here would entail the fact that the sending server claims that an (unnamed) actor hosted by it has Liked the object. This wouldn’t expose the voting pattern information, for better or worse, but it would let the audience verify the count of votes claimed by the voting servers while allowing fine-grained moderation by the original server.

tesaguri · August 25, 2024, 3:06am

I’d add privacy consideration for small and single-user servers.

bumblefudge · August 29, 2024, 3:34pm

There’s another candidate-solution worth considering:

Votes could put the “Server Actor” in the "actor" property of a Vote activity, but a pseudonym in the "attributedTo" property. Note a detailed proposal for doing server-pseudonymized-yet-stable pseudonyms for moderation reports here. Insofar as “Instance Actors” are a thing, and maybe they aren’t in the Foraverse, I think this is actually a cleaner modeling; it’s worth noting that in Emelia’s use case (distributed moderation records), it’s also important to get updates when a pseudonym’s controller deactivates, or to be able to moderate/filter/judge on both an individual-actor and a domain/server/collective level…