Dereferencing non-HTTPS URIs as `id`

trwnh · October 11, 2019, 9:01am

3.1 Object Identifiers

All Objects in [ActivityStreams] should have unique global identifiers. ActivityPub extends this requirement; all objects distributed by the ActivityPub protocol MUST have unique global identifiers, unless they are intentionally transient […] These identifiers must fall into one of the following groups:

Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).

An ID explicitly specified as the JSON null object, which implies an anonymous object (a part of its parent context)

Identifiers MUST be provided for activities posted in server to server communication, unless the activity is intentionally transient. However, for client to server communication, a server receiving an object posted to the outbox with no specified id SHOULD allocate an object ID in the actor’s namespace and attach it to the posted object.

All objects have the following properties:

id

The object’s unique global identifier (unless the object is transient, in which case the id MAY be omitted).

type

The type of the object.

One thing I’ve wondered about is point 1 under 3.1 – the id MUST be publicly dereferencable, with its authority belonging to the origin server… but it only SHOULD be https.

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isn’t HTTPS? I assume that the intention was to suggest using HTTPS over HTTP (e.g. section 3.2 assumes HTTP GET and content negotiation, as well as headers and 403/404 error codes; section 5.1/5.2 specifies HTTP POST to outbox/inbox; Section 6 uses MUST for making an HTTP POST to actor outboxes and 201 codes), but my curiosity is with other URI schemes entirely.

Perhaps one practical consideration (as there are various impractical ones, such as file://, ftp://, ftps://, sftp://, and so on) is to not use a URL but instead use a URN (such as doi, isbn, and other urn: URIs). Of course, this requires us to do a little more work to treat them as “dereferencable”, such as by including a HTTPS proxy or some other resolver service. I wonder how that might be done, and whether this is at all worth pursuing. It could be used for deduplication, for example, by assigning a network-wide URN, with the authority deferred to the lookup service/proxy used as an instrument.

lanodan · October 11, 2019, 9:26am

[2019-10-11 09:11:29+0000] Abdullah Tarawneh via SocialHub:

One thing I’ve wondered about is point 1 under 3.1 – the id MUST be publicly dereferencable, with its authority belonging to the origin server… but it only SHOULD be https.

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isn’t HTTPS? I assume that the intention was to suggest using HTTPS over HTTP (e.g. section 3.2 assumes HTTP GET and content negotiation, as well as headers and 403/404 error codes; section 5.1/5.2 specifies HTTP POST to outbox/inbox; Section 6 uses MUST for making an HTTP POST to actor outboxes and 201 codes), but my curiosity is with other URI schemes entirely.

Yes, with stuff like HTTP over tor’s onion services you can easily consider them to be better than HTTPS (versus HTTP), and I wouln’t be surprised that the new things like dat, ipfs, zeronet, … are quite good at this one too.

Perhaps one practical consideration (as there are various impractical ones, such as file://, ftp://, ftps://, sftp://, and so on) is to not use a URL but instead use a URN (such as doi, isbn, and other urn: URIs). Of course, this requires us to do a little more work to treat them as “dereferencable”, such as by including a HTTPS proxy or some other resolver service. I wonder how that might be done, and whether this is at all worth pursuing. It could be used for deduplication, for example, by assigning a network-wide URN, with the authority deferred to the lookup service/proxy used as an instrument.

URIs for ActivityPub’s id where there is no way to verify the author should never be supported, I saw an implementation of ActivityPub using raw UUIDs (just something like 34e2bfba-aad2-4f81-b106-f98e3b08fe80) and it basically means that anyone on the fediverse have the right to do any activity on it, and there is a race-condition if you feel like accepting only a Create for them.

bengo · October 11, 2019, 10:33am

I saw an implementation of ActivityPub using raw UUIDs (just something like 34e2bfba-aad2-4f81-b106-f98e3b08fe80 )

distbin.com uses UUID URNs for everything as a canonical ID. It also adds an as:url property to all the activities pointing back to itself. But idk it seems long-term ‘fragile’ to use DNS-backed URLs for canonical IDs.

@trwnh

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isn’t HTTPS?

DIDs fit this definition, IMO, or at least should be in this discussion.

I remember pushing against this ‘MUST be publicly dereferencable’ bit in WG discussions, and advocating for a SHOULD. Just seemed overly restrictive. There will always be some URIs that certain implementations or contexts don’t know how to dereference, and it’s fine that they’ll just throw an error at that point (‘ftp://’ urls are a good example, who’s gonna handle those?). dereferencability is a spectrum, not a black-and-white thing.

Ultimately I think I was happy with the added “unless they are intentionally transient” bit, as it makes me not feel so bad about blatantly ignoring the MUST.

lanodan · October 11, 2019, 11:03am

Yup, this is the one and I treat it as just broken, they should seriously consider using their as:url/http://www.w3.org/2002/07/owl#sameAs as the actual id. DNS is a fragile thing but it’s better than absolutely no controlled namespace.

cwebber · October 11, 2019, 4:06pm

So, does this mean that there are other possible choices for a publicly dereferencable id with proper authority, that isn’t HTTPS?

Yes and the Golem demo shows exactly an example of this, using a (very-cut-down demo) of using Datashards with ActivityPub (it was called “magenc” back then) to distribute activities. A similar example could be done with IPFS, for instance (though that doesn’t provide the privacy/security properties that Datashards does).

cwebber · October 11, 2019, 4:08pm

Similarly, bearcaps are a possible non-https URI scheme that we might use.

Yes, with stuff like HTTP over tor’s onion services you can easily consider them to be better than HTTPS (versus HTTP), and I wouln’t be surprised that the new things like dat, ipfs, zeronet, … are quite good at this one too.

Yep, and wanting to support Tor Onion Services is a reason I explicitly pushed back against pressure to make the spec https-only, which some people wanted.

trwnh · October 12, 2019, 2:42am

I guess what I’m trying to comprehend the most would be, how would compatibility work between the HTTPS linked-data web, and the non-HTTPS documents? If we start passing around AP JSON-LD documents with non-HTTPS id then it would obviously break due to basically all implementations assuming that all they need to do is HTTP GET id.

I remember reading the RWoT paper about DIDs in ActivityPub and having the same confusion at the time about how it would work practically. The closest thing I could find was instrument in the AS2 Vocab, but that has a domain of Activity only, and is described as "Identifies one or more objects used (or to be used) in the completion of an Activity", so it would have to be redefined to work on Object as well.

rinpatch · October 12, 2019, 8:21am

I guess what I’m trying to comprehend the most would be, how would compatibility work between the HTTPS linked-data web, and the non-HTTPS documents?

Wouldn’t an array of ids work? For example:

{
  "id": ["magnet:?xt=urn:sha256:hash", "https://ap.instance/objects/whatever"]
}

Then the implementation could just pick whichever one it supports based on the iri scheme. As far as I can tell, there isn’t anything in the AP spec that forbids that, and while it’s not supported by most implementations right now, the complexity of the change required to support this is very low.

how · October 14, 2019, 5:26pm

id as an array sounds like a slippery slope. In the above example, another approach would to use the object's url property to introduce alternative:

"id": "https://social.example/objects/whatever",
"url": {
    "type": "Link",
    "mediaType": "application/x-bittorrent;x-scheme-handler/magnet",
    "href": "magnet:?xt=urn:sha256:hash"
}

Note that there can be more than one url representing the same object.

trwnh · October 16, 2019, 4:12am

Actually, multiple id seems interesting to me for a different thing: nomadic identity. Consider a case where an Object has multiple HTTPS URIs.

That might not be the best way to do it, though. In fact, one shortcoming that is immediately apparent to me is the question of what to do if the document retrieved from one id is different from that retrieved from another id. So it would be better to have one canonical id only, where possible. But technically, the problem of differing documents can still happen, due to something as simple as a desync.

rigelk · October 16, 2019, 12:25pm

If I recall the Zot/6 specification defines portable ids for actors only, not for Objects. It effectively differentiates the transported entity (static) from its author identity (nomadic).

how · October 16, 2019, 3:35pm

This is exactly what I meant with “slippery slope”.

@rigelk how would do translate portable ids in ActivityPub terms following the previous example?

rigelk · October 16, 2019, 4:36pm

I’m afraid that’s beyond my (not in-depth) reading of the spec. There is no single summary of the account discovery and verification process.

spider · October 18, 2019, 3:02am

In Zot8, nomadic content is referenced with

x-zot:{{portable_id}}/{{object_id}}

This structure does not map to ActivityPub identifiers and needs to be converted to “best guess” URI on a not-dead server at the time of conversion. Lookup of an unknown portable_id is fairly efficient and requires at most three queries if I recall.

There were attempts to use id arrays and url arrays but this didn’t pan out because the current set of servers serving a particular piece of content can be very dynamic and all of those listed could be dead, even if the content still exists on multiple sites. Also as noted by others the retrieved content may not be identical because the content in the mirrored copies may be adjusted to reference local assets, since the original server may be unresponsive.

The current mechanism for representing nomadic identity in ActivityPub is to use the Mastodon movedTo and alsoKnownAs migration mechanism with a simple modification to use “copiedTo”. This is presumed to have a much lower barrier to entry on the ActivityPub side than Zot nomadic identity. In this case there is no single portable_id on the ActivityPub side but a collection of linked identities with distinct ids.

Cheers.

trwnh · October 20, 2019, 5:23am

Probably getting a little off-topic to discuss the specifics of various portable data schemes – kinda sorry for bringing it up. I’d really like to focus on modeling how to dereference non-HTTPS in a way that is still roughly compatible with the HTTPS Web-based network. So far the following options have been mentioned:

id as Array; pick whichever one you understand.
url as Array; pick whichever one you understand but leave id as HTTPS.
just break compat and let implementations figure out how to resolve id (if they can)
use a local proxy directly as id (technically still globally unique but represents change of authority)
use a proxy Service as instrument (and extend instrument to be applied to Object and not just Activity)

how · October 20, 2019, 4:05pm

I’m a bit concerned about extending to objects what could be done with creating an object-specifc actor acting as a proxy. The instrument service seems to be adapted to this use-case: you get a representation of the object’s metadata but must use the out of band service to retrieve the actual object. Best of both worlds?

spider · October 21, 2019, 10:48pm

The point was that id and url arrays appear to be portable on the surface, but may fail in practice as they need to be maintained when the graph of servers holding that content changes. Maintenance of these arrays isn’t an unsolvable problem but it needs to be considered. A proxy or instrument service acting as a proxy brings one back to the original problem which nomadic identity was created to solve - it usually represents a single point of failure. So this leaves

just break compat and let implementations figure out how to resolve id (if they can)

We’ve already seen a willingness on the part of at least one of the editors to drop/ignore this part of the specification to support did: and magnet: portability schemes and bear: uris so this is not necessarily a show-stopper.

nightpool · October 22, 2019, 12:59am

I’m not sure i understand this comment. In what sense are did and magnet not publicly dereferencable?

spider · October 22, 2019, 2:27am

In what sense are did and magnet not publicly dereferencable?

" (Publicly facing content SHOULD use HTTPS URIs)."

I thought this was a MUST, so my bad.

did and magnet are probably dereferenceable but then so are ftp: and ldap: and imap: and gopher:. Does this mean that every ActivityPub implementation must support every URI scheme that has ever been registered? If not, which schemes MUST all ActivityPub servers support? Where is this documented?

nightpool · October 22, 2019, 3:49am

The ActivityPub standard does not define that any given server must support any one scheme. I don’t understand what the purpose of having it do so would be? it would just limit the flexibility of the protocol for no practical benefit (If two servers don’t support the same schemes, then they’re obviously not going to be able to talk to each other, MUST or no MUST)