This FEP defines a best-effort protocol for migrating an entire ActivityPub server from one domain to another when the operator controls both domains and can keep the old domain online.
Appreciate any feedback, thanks!
This FEP defines a best-effort protocol for migrating an entire ActivityPub server from one domain to another when the operator controls both domains and can keep the old domain online.
Appreciate any feedback, thanks!
The mapping rules outlined seem good in theory, but are not expressive enough in practice.
There are three migration paths when migrating to a new domain:
/ to /forum)
It's possible that one could change software, migrate data, and all identifiers remain the same.
Realistically this never happens.
Realistically what happens is a reverse proxy mapping is put into place so requests to the old path under the new domain redirect to the new URL. In this scenario, there is no regular expression that will fully capture this.
e.g. /t/274884/some-topic-slug to /topic/274880/some-topic-slug
Thanks, this breakdown makes sense.
I agree that changing software while preserving everything is a much more complicated scenario. Once the software changes, routes and identifiers usually change in ways that cannot be captured cleanly by simple mapping rules. This spec is not trying to solve that problem.
The first two cases likely cover 95 percent of real migrations. Keeping the same software and database, with either no path change or just a prefix change, is predictable and rule based. That seems like the right scope for what we are building here.
I can clarify in the spec that full software migrations are out of scope.
Have you looked into the migration spec that @jonny@neuromatch.social put together a couple months back?
It was regarding content migrations between servers, but one could adopt the same for switching domains (although I think Jonny's is scoped to the user level, not at the instance level like yours)
No, I have not seen it. I was actually trying to find an existing spec before starting this one. Is it one of the FEPs?
See my reply in another thread:
Ugh, Discourse is not receiving the whole thread. Please take a look at @jonny@neuromatch.social's replies on ActivityPub.Space
@julian oh ya sorry, thread broke for me too, not trying to be shady and reply on the sly without mentioning you @skavish
I’m looking at the FEP, thanks!
That said, my goal was more modest: simply moving a server from one domain to another, essentially a rename.
Migrating actors one by one could be quite inefficient for a large number of users.
I also think we need to move to DIDs sooner or later. That would solve a lot of these messy migration issues.
how? the “messy migration issues” are caused by a fundamental change in identity. moving to DIDs alone doesn’t magically solve any issues, it just binds the identity to a different mechanism. changing DIDs would have exactly those same “messy migration issues”, as the atproto ecosystem shows with no support whatsoever for migrating between did:plc and did:web.
the solution to migration messiness is “pick one id and stick to it”. you don’t need to abandon DNS to do that – you just need to use stable DNS names and change the records. DNS->IP mappings are done with A/AAAA records, while DNS->DNS mappings are done with CNAME records.
mastodon issue discussing this from back in 2017: Multi-tenancy with custom domains · Issue #2668 · mastodon/mastodon · GitHub
if you don’t want to deal with DNS, then you can solve the problem at the HTTP(S) level using redirects, like PURL services do. this in effect makes the PURL service the “network”, and the “final destination” of resources can change as long as the redirects are updated.
if by “same software” you mean “same URI structure”, then maybe. this could be done without a database and with static files, too. relative references could make this even easier, if there weren’t scary language around it in AS2-Core.
even so, there are some mechanisms in HTTP that can also help here:
as the linked mastodon github issue points out:
historic post migration […] would require more cooperation between the two servers to ensure that they have moved
this cooperation can be achieved via an index of all resources, kind of like a site map but for migration/redirections.
changing DIDs would have exactly those same "messy migration issues",
In FEP-ef61 DIDs are used to construct stable identifiers, which are not supposed to be changed. You can move your data to another server, but object IDs remain the same.
as the atproto ecosystem shows with no support whatsoever for migrating between did:plc and did:web.
This is because did:plc is not a decentralized identifier, it is just server operated by Bluesky company. So it is not surprising that people want to migrate to their own domain.
In Fediverse we will not have such problem because FEP-ef61 enables true self-sovereign identity.
even in FEP-ef61 if the key changes then your identifier changes. "not supposed to change" is a property that can be applied equally to any identity system, including DNS names and URIs. the thing that makes them change is lack of foresight, or including components likely to change in them (such as usernames or services).
the problem space here can be divided into two aspects:
if we do well enough on the 1st point, the 2nd point rarely if ever comes up.
one way to assign ids which don’t change is to make id assignment an explicit function of the network layer. when you join an irc network, you have an fqdn for “the network” and separate fqdns for “the servers” (which make up “the network” by federating messages between each other). in practice it looks like this:
i join irc.libera.chat on port 6667/6697 and get round-robin assigned to one of the servers in the network. this is mostly transparent to me and only visible in my irc client logs. i send a message to irc.libera.chat/#pleroma-dev and it gets passed around to all the other servers in the network.
identity remains stable because even though i sent the message to a single server, the identifier is minted in the namespace of the network. if i get shuffled around from cadmium to molybdenum, it doesn’t break anything. as far as i’m concerned i am talk on irc.libera.chat, not molybdenum.libera.chat – although my client is connected to the latter.
a similar thing could be done on fedi using DNS by giving actors their own fqdn where appropriate. “portability” then is a matter of maintaining the mapping between resources and some backend. to prevent availability issues when a network’s identity server goes down, you can keep track of equivalent identifiers. in practice i could connect to service1.example and interact on network.example which keeps track of every copy of my interaction. similar to how nomadic identity in zot/nomad/etc acts like a database replica/mirror, except instead of a signed GUID binding the identity to a keypair, you could bind the identity to a DNS namespace that individual servers/services agree to share as “the network”. and servers/services can join multiple such “networks” for redundancy, similar to how nostr relays work (which are themselves just rehashing irc relays).
in such a system, changing the base URI would be expensive if all the subresources had pre-baked absolute URIs, but incredibly cheap if the subresources used relative references.
Maybe this could work, I don't know. Sounds complicated.
I don't see any problem with DIDs, though. It's a good standard for stable identifiers, you can even use did:dns if you want to tie your identity to a DNS name. But I wouldn't do that precisely because domain names are NOT stable, they are rented names. If you can't afford it, you lose it. If a registrar becomes hostile, you lose it. Or maybe you just don't like the name anymore. Keys are better.
DIDs aren’t really problematic but they don’t solve nearly as many of the messy issues as people seem to think they do. They’re basically just a framework around other identity systems (the “DID method”) that constrains the range to a “DID document”. You still inherit all the properties and problems of the DID method’s underlying identity system.
DNS names can be stable over a period of decades as demonstrated by long-lived organizations. It is true that they have a social weakness more than a technical one, if you are not a long-lived organization. But you can delegate your namespace to a long-lived organization, which is how you get PURLs and the whole “permanent identifier” movement – purl.org, w3id.org, and so on will let you assign longer-lived identifiers, if you trust those organizations to last longer than yours. Similarly to how IANA is a role currently fulfilled by ICANN, purl.org is a project currently maintained by Internet Archive, and w3id.org is a project currently maintained by the Permanent Identifier Community Group and hosted by the W3C. It is conceivable that the IANA role may be reassigned to a different organization, or that purl.org gets passed on to some other non-profit, or so on.
A “network” in the sense I am describing (IRC-like, relays, etc) is basically a PURL service that individual servers can opt into.
Keys might be better in some ways, but they make different tradeoffs. We could say that keys are not stable because it is easy to lose keys, or keys could be stolen, or that keyservers or custodians could be hostile, or maybe you just don’t like the key anymore (outdated algorithm, insufficient entropy, etc). I wouldn’t claim names or keys are always better in all situations.
In the case of a server renaming itself, what is needed is a fan-out of that activity (the “Rename” or whatever it’s called – right now as:Move is used ambiguously) to anyone depending on that name, as well as some basis for authenticating the rename. Keys are only one way of providing that linked identity, and using them implies a trust algorithm for when keys change but names don’t (TOFU, BTBV) or some other transparency mechanism (append-only audit logs).
I’ll add that LOLA Data Portability is another existing set of work that’s out there, and it’s being driven by SWICG. It’s for PROFILES, not for entire domains, but I think it would be interesting to see if this same protocol and format could be expanded for entire domains.
Basic workflow:
Does this make sense? Would it map into your idea for moving whole domains?
Would it map into your idea for moving whole domains?
It’s quite a different approach.
What you’re describing is a true migration: exporting content from one server, importing it into another, agreeing on formats, defining catalogs, handling ID remapping, forwarding, tombstones, and so on. That requires substantial coordination and standardization, especially around export/import formats and migration semantics, and would need to be clearly specified.
The proposed spec, on the other hand, is addressing a much simpler goal: effectively changing the domain of an existing server. There’s no data transfer between independent servers, no export/import process, and no content rehydration. It’s more about re-binding identity and references at the domain level rather than moving data across installations.
this still requires substantial coordination to at least understand which resources are hosted by which hosts. how do you know which resources need to be rewritten? what if the resources have embedded proofs and can’t be rewritten without invalidating the proof?
HTTP 301/308 doesn’t have this issue; maintaining an HTTP cache’s permanent redirect is much cheaper. if you maintain a DNS-level cache it’s even cheaper to CNAME the old domain to the new domain. the downside there is temporality – if the DNS name is ever reassigned you need to do conflict resolution sooner or later.
aside from that, one way to sidestep the problem a bit is to assign local identifiers to any incoming or discovered resources, then maintain local references instead of global references. but you would then need to maintain resource name mappings just like you would otherwise need to maintain host name mappings. think how nsswitch works on a linux system by forwarding requests in sequence – you might try the local resolver (resolv.conf), then you might try your /etc/hosts file, then you might use the DNS protocol…
Good point about proofs, I hadn’t considered that. If IDs are embedded in signatures, rewriting them can invalidate the proof. That’s a real constraint.
On redirects, my concern is long-term:
301/308 or CNAME works as long as you control the old domain forever.
I mean redirects is a practical, but partial solution if the goal is to actually change the name.
if you use a local resolver then this isn’t a requirement. the problem space is essentially “how do you resolve hosts temporally”, instead of assuming the current value applies always and forever.
20 years ago, the tag: URI scheme tried to solve this by explicitly qualifying identifiers with not just authority, but instead using both authority and date. The problem is that tag: URIs don’t come with a default resolution method to dereference them in a consensus way – for example, assuming you use DNS authorities, once the DNS records change, it is difficult if not impossible to obtain the “old” DNS records at some other point in time. And even if you could, they may well likely be out of date because IP addresses change too.
Breaking it down, a reference is usually resolved based on a single string (URI), but URIs are decomposable by their scheme. For example, an https: URI typically decomposes into the following:
What we are missing is a time component, because the https: URI scheme didn’t define one. To be fair, most URI schemes don’t – it’s not clear how to handle them in a consensus way, unless there’s some kind of authoritative logging involved with the default protocol. You can add a revision system to your HTTP cache (and maybe your DNS cache too), but without a way to obtain historical HTTP responses, new participants can’t participate in the network. The logical conclusion usually ends up being “everyone has to maintain their own local references and deal with the lack of history some other way” – perhaps you trust non-authoritative sources to fill you in on what happened before you joined the network? Maybe a consensus mechanism can reduce the likelihood of being lied to? Or maybe you bind all history to a “network”/“identity” layer which can have its own HTTPS base URI (a small amount of centralization here allows more decentralization at the storage layer).
So in https: you end up with something like https://network.example/identifier being minted… somehow? and then mapped via redirect to the current location, which might change later. The question is, what does that protocol look like for minting identifiers and managing the redirects? You basically end up with a PURL service and some API[1]. Or you pick a DNS name that will never change as long as the network continues to exist – for example, you get assigned 123.users.network.example which you can use as the value of a CNAME record. Let’s say john.example is CNAMEd to 123.users.network.example, but later this record is removed and instead jane.example is CNAMEd to 123.users.network.example. Whenever you make a reference, you must always canonicalize the host/authority at the application/protocol layer[2]; so for example, you never store a link like john.example/foo?bar, but instead you store 123.users.network.example/foo?bar which should not change over the lifetime of the network. This can then be trivially handled at the HTTP server config level:
server {
server_name 123.users.network.example;
location / {
return 307 https://current-storage.example$request_uri;
}
}
Note that the base URI doesn’t have to use a FQDN, but doing so is useful because you can then rely on TLS for authentication instead of needing to roll your own auth.
If this looks familiar it’s because Tumblr has been doing it for years. obvious-humor.com is pointed at obvious-humor.tumblr.com, and one can change while the other doesn’t.
Arguably, Bluesky PBLLC’s did:plc method is a variant of this – the default resolution method is to fetch the DID document from the PLC directory at plc.directory, although it is feasible that other origins can be used to resolve DID documents since they are to some extent self-certifying – although those other origins are basically just a DNS cache with extra steps. The did:plc generation algorithm requires you send requests to plc.directory per the did:plc spec – although again, this could be handled by a pool of trusted servers instead of a single trusted server. ↩︎
Again, this is arguably what ATProto’s “handle” system does – the UI layer presents you with john.example but the PDS and AppView canonicalize it to did:plc:whatever internally before writing any records. The only difference is that instead of using CNAME records, they used TXT records, which means their “handle” system is scoped only to ATProto instead of to any/all other protocols using DNS. ↩︎