FEP-5e53: Opt-out Preference Signals

dmarti · June 17, 2024, 2:39pm

Hello, this is a discussion thread for FEP-5e53: Opt-out Preference Signals

Some of the common concerns that I have seen among Fediverse users are about the ability to opt out of certain uses of their content and/or personal information. For example, some users do not want the content they created to be used for training generative AI systems, and some users do not want to have their personal information shared or sold.

There is existing relevant work for cases where the user visits an information-collecting site with a browser, or where the user puts content on a site that is then visited by an information-collecting crawler. Several opt-out preference signals (OOPSs) have been standardized or proposed in the form of HTTP headers that can apply to a connection between a user and a central server. In some jurisdictions, companies that administer web sites are required to process and act on OOPSs.

This FEP would extend ActivityPub to support passing OOPSs along with the content and user information to which they may apply. It is limited to translating existing OOPSs to the Fediverse, and does not propose new ones. The intent is to make it more straightforward to implement, since sites and crawlers are already reading and acting on OOPSs for information that arrives from a browser or that is crawled directly from a server, and this FEP would be limited to requiring the same behavior for information that arrives by way of federation.

stevebate · June 19, 2024, 5:43am

I think this proposal would benefit from more information about specific scenarios and expected use cases. For example I’m not sure how the SEC-GPC property works with ActivityPub S2S push behaviors. If it’s set, then it seems like a server wouldn’t be allowed to share/deliver that post at all (at least that’s one interpretation), which isn’t consistent with social network communication.

It seems like this should also apply to actor documents and to any HTML (or other) representations of the objects served from a fediverse server. Otherwise, robots that want to use the information will just scrape the web content (which they mostly do already) and that content won’t have the privacy directives.

dmarti · June 20, 2024, 1:57pm

Hi @stevebate, thank you for pointing out the loophole for scraping web content.

There is currently no standard way to apply GPC to web content, which I think is another missing piece. I think it’s possible to add it to the X-Robots-Tag HTTP response header and to the meta name=robots tag.

So maybe: a Fediverse server that receives any objects with one or more OOPSs SHOULD apply the same OOPSs to any HTML page that includes any content from those objects?

If a server operator interprets GPC as a signal not to allow normal federation, then that would also be a problem when a user of that server posts content using a browser with GPC on. The intent of this FEP is to give OOPSs the same effect when they arrive server-to-server as when they arrive directly from the browser.

dmarti · June 20, 2024, 2:20pm

Two of the possible user stories for this feature:

Users discover that the Fediverse server they have accounts on is federated with the server of a company that does surveillance advertising and/or AI training. Some users ask the server administrator to defederate from the company’s server to protect their content and personal info, while other users want to stay federated because they have friends on there or for other reasons. After checking with the company and reading their ToS, the server administrator finds that the company has committed to act on OOPSs, so both sets of users are able to muddle through. I’m not saying that OOPSs will enable all Fediverse servers to federate with all AI or surveillance companies, just that OOPSs are, among other things, a tool to help figure out a way through for server operators who can see both sides of the issue.

A Fediverse service offers art sharing features. The operator of the service points out to artists that they will be able to reach not just the users of the service itself, but a larger audience including users of services that are federated with it. Artists ask how they would be able to have their opt-out from AI training apply to those other services, or if they have to stick with a non-federated service in order to send “noai”. (Examples of art sharing sites already offering “noai” opt-outs: DeviantArt,
Cara)

dmarti · June 27, 2024, 6:46pm

I have put in a pull request to change the name of the privacy opt-out to just “SPC”

It is intended to express the same opt-out signal as GPC. In the event that this FEP is accepted I will go through the registration process in Colorado to have it listed on the same list of required opt-outs as the current GPC.

dmarti · August 21, 2024, 3:00pm

Besides SPC for ActivityPub, I am also working on SPC for web server to client message passing.

This paper was recently accepted for the Internet Architecture Board’s workshop on AI control:

The main point that I would like to get across is that people should be able to have their privacy rights and rights in their own work respected – whether they choose to communicate and share their creative work on someone else’s site, on their own site, or in a federated way.

There is also a practical consideration. Considering the difficulty of explaining how an LLM generated a particular output, it is impractical for LLM operators to comply with the requirement to disclose “inferences” under some privacy laws. Enabling people to exclude private information up front would reduce compliance issues for LLM operators later.