ID normalization

ActivityStreams 2.0 spec says IDs are IRIs (RFC 3987): Activity Streams 2.0
ActivityPub spec is more strict and says that IDs are “Publicly dereferencable URIs” (RFC 3986): ActivityPub

But what does it mean in practice?

Example 1: https://嘟文.com/users/OldBig/statuses/111971511872396431

Should this ID be treated as equivalent to ? If so, I guess implementations are expected to normalize IDs and always work with ASCII form (e.g. when searching through local cache), to avoid duplicates.

Example 2:

Should this ID be treated as equivalent to At least from the WHATWG URL spec perspective, these two forms seem to be equivalent.

See my non-captured reply here.

1 Like

Per example one, the URL should not be converted to punycode. It has no utility from a technical perspective and only serves to cause issues when compared against the unicode form.

The reason for this is spec compliance: ActivityPub requires IDs to be URIs, and URIs can only contain ASCII characters.

Non-ASCII characters need to be either percent-encoded (when occur in path), or punycoded (when occur in domain name).