This is a discussion thread for the proposed FEP-96ff: Explicit signalling of ActivityPub Semantics.
Please use this thread to discuss the proposed FEP and any potential problems
or improvements that can be addressed.
Summary
A number of vulnerabilities have occurred in ActivityPub implementations due to
“type confusion” attacks - where unrelated files on the same hostnmae as an ActivityPub
implementation are processed as obejcts with ActivityPub semantics.
Such attacks have been mitigated by carefuly validating the Content-Type header (and
by implementations ensuring that users cannot create files with the application/activity+json
or application/ld+json content types), but it would bolster such defences if messages
intended to be processed with ActivityPub semantics
Additionally, ActivityPub nominally supports transfer syntaxes other than JSON-LD (such
as any other RDF syntax like Turtle; or potentially a more bandwidth efficient syntax such
as a hypothetical CBOR-LD). Strict content type filtering permanently prevents usage of
such syntaxes in the future
The RFC2119 text has gotten a bit mangled when copied there - perhaps that paragraph should be deleted?
The primary purpose of this document is to (eventually) make all use of ActivityPub semantics explicitly signalled - effectively, semantics should be opt-in.
The ActivityStreams 2 syntax can be used independently of ActivityPub, and non-ActivityPub systems such as Cohost produce ActivityStreams 2 documents.
Isn’t the problem of spoofed attributions also applicable to Activity Streams in general? Or, do you have another concern unique to ActivityPub?
My pet theory is that one of the points of a common data model like Activity Streams is interoperability between different ecosystems, e.g. ActivityPub consumers can share contents from static Activity Streams servers, and I fear completely rejecting non-ActivityPub producers defeats that merit.
Yes, but the exact semantics expected may differ. This doesn’t preclude interoperability with non-ActivityPub systems, but does say “Hey this isn’t ActivityPub and you have to be (potentially differently) careful what you do with it”
While an ability to signal different semantics is a nice addition, that seems to me to be a different matter from the security issue mentioned in the proposal.
As a more conservative precaution against the type confusion attacks, how about using the profile link relation type (RFC 6906) with a value of the Activity Streams namespace (i.e. Link: <https://www.w3.org/ns/activitystreams>;rel="profile")? This still achieves the proposal’s goal of supporting RDF syntaxes other than JSON-LD, and it’s also applicable to non-ActivityPub producers.
One thing that’s entirely missing from it is deployment, surely you don’t mean to have that be an hard-requirement for federation, right? That would be a complete disaster.
Meaning implementations need to signal their support of it.
(btw replying through email seems broken, no idea why, nothing wrong in my server logs…)
If implemented with the “An implementation MAY” behaviour as specified, then this is backwards compatible with existing implementations (Unless they’re including a different Link: <>, rel=type value, which to my knowledge no implementation does)
If implemented without that behaviour, then yes, you don’t have backwards compatibility
The idea is that if this becomes pervasively deployed then you can just remove the MAY behaviour.
If you get back a response from an implementation with a Link: <https://www.w3.org/TR/activitypub/>;rel="type" header, then the implementation supports 96ff . Perhaps it could be explicit that you must implement it everywhere?
What does “ActivityStreams with the ActivityStreams profile” actually mean? I think anything we do should convey that this is ActivityPub. It may be dissatisfying that doing this breaks the “open world assumption”, but it seems like doing this is the easiest way for us to avoid type confusion attacks.
With regards to rel=profile vs rel=type:
rel=type is about server capabilities. LDP uses it to tell you “This is an LDP server, and this HTTP resource is an LDP resource”; here we use it to say “This is an ActivityPub server, and this HTTP resource is an ActivityPub resource”
Do you intend something like Content-Type: application/ld+json; profile="https://www.w3.org/ns/activitystreams"\r\nLink: <https://www.w3.org/ns/activitystreams>; rel="profile" by “ActivityStreams with the ActivityStreams profile”? In that case, we’d have no problem with interpreting the profile parameters as idempotent, I guess?
I’m not suggesting to replace your proposal. Instead, I’m suggesting to extend the “MAY” behavior to make it applicable to other RDF syntaxes. (You may not like extending the requirement for backward compatibility with something new, though.)
While the “Hey this isn’t ActivityPub” semantics sounds nice for making some decisions on compatibility considerations, I believe we need a requirement clearer than just “be careful” as a security precaution, which should be fool-proof, and yet I suppose that rejecting anything that doesn’t explicitly support ActivityPub is not what everyone wants.
Content-Type and profile may not perfectly fit the purpose of expressing the server’s intention, but I suppose that it would still be a reasonable compromise to assume the responsibility of servers that presents the Activity Streams media type to ensure a minimum level of integrity of data they publish. In that case, we might at least want to update the security considerations of the IANA registration, though.
By the way, regarding security requirements in FEPs in general, the discussion in the following topic may be interesting:
(Not that I have an opinion on it. I’m linking to it merely for an informational purpose.)
How does the proposal interact with the specification profiles of ActivityPub?
The Recommendation defines two comformance classes for servers:
ActivityPub conformant Server
ActivityPub conformant Federated Server
If a server only implements one of the profiles, can the server still express the type link relation?
Since not many servers implement both the profiles, maybe we don’t want to restrict the proposal to servers that supports both. But in that case, could a client be sure that a server supports the federation when it fetches a resource and gets the Link header, for example?
I don’t see any real reason for it to not be implemented for both.
The main purpose of the rel=type in the header is to let you know the server isn’t a confused deputy; if it implements one of the profiles and not the other, that’s irrelevant to whether its confused. You might find some things you want to do don’t work, but that’s OK
To help me understand, if the referenced server implementations had been checking the content-type header validity, would this have prevented the “type confusion”? If not, why?