Ability to distinguish individual users behind Flag activities in a privacy preserving way

thisismissem · May 13, 2024, 11:39pm

At present, a common (and good) practice in the Fediverse is that moderation reports sent from one server to another are sent as the “instance actor” to protect the underlying reporting user from malicious or retaliatory server administrators.

This is generally good, however, it does mean that there’s generally no way to reject reports from a specific bad remote actor, instead you can only reject the reports for that entire server.

So if a particular remote actor decides to try to harass via reports (and the remote server moderators aren’t watching), then there’s a method for recourse of the receiving server.

For example, a Flag activity from Mastodon looks something like the following:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "http://social.example/a9e86dbe-3173-4c4d-a9b8-dd031f26aebc",
  "type": "Flag",
  "actor": "http://social.example/actor",
  "content": "This account harrased me",
  "object": [
    "http://social.example/users/sid_ebert1"
    "http://social.example/users/sid_ebert1/statuses/112436359884862090"
  ]
}

I’d like to propose that a privacy preserving way, whilst still enabling rejecting of reports may be to use the attributedTo property, using a URL that contains a salted hash of the users’ username, handle, or ID

So we’d get something like:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "http://social.example/a9e86dbe-3173-4c4d-a9b8-dd031f26aebc",
  "type": "Flag",
  "actor": "http://social.example/actor",
  "content": "This account harrased me",
  "attributedTo": "http://social.example/report_user/13abbd03326d4e5f7777267365301409df7eaae66888a675a18dcaf668bfd1b3",
  "object": [
    "http://social.example/users/sid_ebert1"
    "http://social.example/users/sid_ebert1/statuses/112436359884862090"
  ]
}

For this example, I used the following hacked into Mastodon’s codebase:

URI.join(root_url, 'report_user/', OpenSSL::HMAC.hexdigest(OpenSSL::Digest.new('sha256'), Rails.application.secret_key_base, object.account.username))

Where Rails.application.secret_key_base is near-guaranteed to be unique to the server, and looks something like:

57b233bc6b8904a94bee43c36d6168b6a71c2d6be1ade992cea08317695eeca31b830e4f398bc689fa168de333a9ee36a80f45b282be63dedae5e6d3c0f12b16

Sure, you wouldn’t be able to look up the “attributedTo” actor (it’s a “fake URL”), but it would be consistent between reports by the same user (as long as the server’s secret key doesn’t change)

thisismissem · May 13, 2024, 11:46pm

Of course, you don’t need to use a hash or anything, it just needs to be a unique identifier for the user that isn’t public / doesn’t reveal information about the user (other that a unique identity), so for instance you could store a UUID in a reporter_id column on your accounts table, and that’d be fine to use since you don’t return that back ever to the public.

trwnh · May 14, 2024, 3:00am

you wouldn’t need both “actor” and “attributedTo” since “actor” is already a subproperty of “attributedTo”. i’d just make the actor the report_user directly.

thisismissem · July 21, 2024, 11:52am

It can’t be, since report_user doesn’t expose any details other than sharing a base URI with the server actor.

bumblefudge · August 29, 2024, 3:14pm

I think the intention here is for actor to be a server Actor (or a moderation-team Actor?) and the attributedTo to be a pseudonymizing euphemism for the actual reporting end-user, which is reused across all reports from a given account but can only be de-pseudonymized by the server/moderation team. I checked, and while Actor is a { Link | Object } that has to { dereference to | be } an Actor object, attributedTo is a { Link | Object } that a note in the ontology specifically mentions does not need to be. So I think a URI (which could, perhaps, be dereferenced to the original Actor by another method or endpoint?) would make sense here?

trwnh · August 29, 2024, 11:30pm

this came up in another topic (FEP-0391 currently proposes using attributedTo pointing to activities that resulted in the current activity) but erincandescent pointed out that attributedTo has a lot of problems with being too vague and circularly defined. for the specific use case you describe, i would consider an extension property along the lines of onBehalfOf or similar. trying to stay within the existing vocab is not always a good idea, and indeed there are other properties that at first glance could work too — generator comes to mind.

the thing to be wary of is that in RDF, as:actor implies as:attributedTo, due to it being a subproperty. so you have three statements being made instead of two — the actor is the actor, but the actor is also the attributedTo, and then the third statement would declare that the pseudonymous actor is also the attributedTo:

<activity> as:attributedTo <p> .
<activity> as:actor <a> .
<activity> as:attributedTo <a> .  # this is implied by the immediately above statement

bumblefudge · August 30, 2024, 7:15am

I believe you that this would create problems for canonicalization or RDF parsers but I neither understand why this is the case nor what those problems might be. can you ELI5, bearing in mind I have a shaky grasp of what “subproperty”, “imply” and “statement” mean in this context? does every Actor without an attributedTo have an “implicit” one pointing to itself if none other is set when expanded or something? I feel like we need a FEP to explain to non-RDF folks how overloading existing terms can create this kind of ambiguity if anything marked as a subproperty does this “implicit” thing!