FEP process suggestion: don't use title-based hash for slug

This has been a long-standing problem for FEP authors who want to update the title of their DRAFT FEP based on feedback from the community.

A simple option to not require a slug change for a rename and only use the title for the initial slug generation.

However, there are other issues. The slug namespace has 16^4 values. However, the possibility of hash collisions increases significantly as more FEPs are added (at 300 FEPs, there’s a 57% chance of collision). Maybe we should let users pick any non-colliding slug. If that slug already exists when the FEP is initially submitted, it would be rejected. This requires no significant overhead for the maintainers while avoiding slug collisions and title dependencies.

My preference is for the second option, but the first would still be an improvement.

Thoughts?

3 Likes

I always wondered why an auto-incrementing integer wasn’t good enough. It works for RFCs, XEPS, PEPs, and so on. I’m guessing it mostly has to do with the PR-based workflow, because two FEPs pending at the same time might lead to a merge conflict.

We could also instead use unhashed titles for a simple name-based addressing. The hashing is purely a convenience to generate shorter slugs.

An earlier issue also exists for this topic: #196 - Consider switching to sequential IDs - fediverse/fep - Codeberg.org

The main consideration right now is that some FEPs may have gone FINAL and so their identifiers really shouldn’t change, especially if they define terms within the namespace of that identifier.

Perhaps current FEPs can be “grandfathered” into a newer scheme that loosens restrictions on slugs, and only newer FEPs get to request/suggest their own slug. But then that makes the FEP process a sort of registry of which names map to which FEPs, and there needs to be a policy around governing those names, at which point the FEP process basically becomes a mini IANA. (One can argue this is already the case albeit with a governance policy of simply hashing a title.)

In designing such a governance policy for FEP names, we would probably want to avoid namesquatting, so we would need clear guidelines on which names are acceptable and which ones aren’t. Otherwise, we could adopt specific forms of unique identifiers that are short and collision-resistant within the lifetime of the FEP process. Some algorithms that fulfill this criteria are:

I think it’s possible to integrate a library for one of these into the FEP submission process so that new FEPs are automatically assigned identifiers (via an automatic hook), if the desire for random identifiers is still there. Otherwise, we would want to hammer out a governance policy for names as described above.

My understanding has been that the maintainers felt that was too much work to use a counter. I feel like there was a discussion here about this (other than the Codeberg issue tracker) a while ago, but I couldn’t find it.

I don’t know if this or the remaining comments were about my suggestions or your suggestion to use unhashed titles as identifiers. To be clear, what I’m proposing does not change the slug structure. They would still be fep-XXXX where X is a lowercase hex digit.

I’m also not suggesting that any existing FEPs change their slugs or that they are even allowed to do that. The point of my proposal is that slugs become more immutable than they are now since they can (must?) change with a title change.

Other than the slug generation/selection, the current process stays the same. The recommended technique for creating a slug could still be to hash the initial title of the FEP. It just wouldn’t be the required technique.

I’d recommend that an FEP slug is not “official” until the FEP has been merged into the repo.

I’ve also mentioned this a number of times. An additional disadvantage is that insider participants in the AP dev community start to communicate in codes instead of calling FEP’s by names that describe their functionality. Any outsider needs the FEP list as a lookup table to understand anything that is said.

1 Like

I would prefer decimal IDs because they are easier to memorize. We can keep existing hexadecimal slugs but assign 0001 to the next FEP.

This shouldn’t require many changes to FEP automation scripts.

Most memorizable are consistent names that are descriptive to what the FEP offers, and people referring to those by default, and only secondary refer to ID’s. Or you’ll get microblog chatter that goes like this between ecosystem insider experts (leaving out the unnecessary FEP- prefix ‘pragmatically’: “Hey John, I see you working on 67, but having 12 and 17 implemented its better to join me on 666”.

I recently pointed out a slide in Rich Hickey’s exellent “Hammock Driven Development” presentation, where he states that most important in any design before you do anything else, and where fedi dev community dropped the ball.. avoiding misconception.

Perhaps the ID can be de-emphasized and FEP Title be made prominent, the logical thing to choose in communication.

Step 1: Avoid misconception

PS. @stevebate it is a pity you created this in Welcome category. Perhaps you or @helge can move this to the FEP category (and perhaps even cause the thread to be retroactively federated :grimacing: )

(It doesn’t appear that I have the authorization to create a thread in the FEP category.)

A numeric sequence is an improvement, but it centralizes the assignment of identifiers. One nice feature of the hash-based slug is decentralized creation of identifiers (with facilitators only needing to prevent rare duplicates).

I think that’s more a communication issue than an argument against slugs. I suppose that not having slugs developers would be forced to use titles that might change or be misremembered in ways that cause confusion with similarly-titled FEPs.

Indeed, but it is a strong consideration to take into account when making the change. For instance in the table layout of FEP’s the column order might be FEP name | description | id in that order with the name in bold and having the link. The title of the FEP markdown doc itself might omit the ID and add it to metadata fields instead, or put it last between parentheses. The preference should be documented in the FEP Process explanation, presented as a best-practice.

As for slugs a similar URL scheme to the one that Discourse uses can be followed, with URL’s ending with e.g. /fep/the-fep-process/0001/ but also accepting /fep/the-fep-process and doing a redirect in that case (which is opposite to Discourse’s scheme, where you can leave out the title URL part).

we already have purely decimal FEPs which will conflict eventually:

  • 2100
  • 5624
  • 7888
  • 1970
  • 0837
  • 7628
  • 2677
  • 7502
  • 3264
  • 6481
  • 7458
  • 0391
  • 7952
  • 9091
  • 0499
  • 1985
  • 6606
  • 9967
  • 2277
  • 2931
  • 5711
  • 1042
  • 0151
  • 9098
  • 8967
  • 1580

the lowest of those is 0151. i think it is feasible that at least 151 more feps will be published eventually.

also, it will cause confusion wherever numeric identifiers are reused, even if leading zeros are added – “did you mean fep 0151 or fep 00151?”

unfortunately it seems like the door has closed on decimal integer ids without very messy migrations which i do not recommend pursuing.


maybe “centralization” and “decentralization” is the wrong framing here? it increases maintainer burden somewhat, but the namespace is already “centralized” to the FEP repo and process. what we are interested in here is reducing the maintainer burden or governance burden. hash-based ids are less contentious, but no less “centralized” because they have to be qualified within the FEP space.

this is the downside of names, yes. references to feps using outdated titles will not make sense if the title changes. i’d argue this is still a problem with slugs too, though (since the slugs are hashes based on the title). decimal integer ids don’t have this problem, but they introduce the maintainer burden discussed previously (as FEP authors must first receive a FEP id, or otherwise get assigned one at submission time which is hard to predict if they need to know the id ahead of time).


this creates problems for canonicalization of URIs and can lead to proliferation of aliased URIs which aren’t immediately recognizable as “the same thing”.

We may have different definitions of some of those terms. Yes, the namespace is constrained by the 4 hex-digit slugs, but the slug generation is not necessarily centralized because of that. With the numeric sequence, a small set of facilitators will have the responsibility to allocate identifiers and maintain the counter when FEPs are submitted. To me, that’s clearly a more “centralized” process compared to author-generator slugs. If we ever decided to further decentralize the FEP process (no central repo, federated indices, etc.) then “decentralized” identifier generation would be beneficial.

Yes, this is precisely why we need short IDs. I don’t want to type “A common approach to using the Event object type” each time I refer to FEP-8a8e.

There won’t be a conflict if we do it right.

If N already exists, then N+1 should be assigned, I already wrote the code that does it.

I agree, that’s a nice feature. Perhaps we can continue using hash-based slugs for unpublished FEPs?

Communicating in codes is absolutely horrible. Perhaps nice once you memorized them, but for outsiders it is abracadabra, and esp. when they don’t have a link to the FEP codex page.

Note that I am not suggesting to use the full title, but consisten short names to be assigned to them. In your example it might be “Common Events” and the title “Common Events: A common approach to using the Event object type”. Another example is “Federated Groups” and “Delegated Groups”.


Update: Adding examples..

In this reply no one but insiders will understand what is talked about:

Is this what Mitra does now for 171b/1b12 cross-compatibility?

In this toot @silverpill you use 3 practices: 1) name-drop only the code, 2) code with short name between parentheses, 3) just the short name i.e. “Conversation Containers”..

I think you’re right about FEPs and cooperation. The optimal strategy is cooperate by default and defect only when the other side defects.

With quotes we were almost successful. The initial version of Mastodon’s consent-respecting quotes was based on FEP-e232, but later they decided to introduce a new property.

Now there’s FEP-521a (public keys), which they expressed interest in implementing.

Even with Mastodon, it’s worth trying.

Groups

I think Lemmy’s implementation is not bad. There is also Conversation Containers from Hubzillaverse, and a possibility of convergence, see this thread https://lemmy.ml/post/43519233

Recommendation

Instead of saying FEP-521a (public keys) turn it around and say Public Keys (FEP-521a)..

  • Naming is hard. Finding good names helps shift focus on the design of the fediverse ecosystem, where good terminology forms the ubiquitous language that leads to shared understanding.
  • So that sentences in developer communication become natural language highlighting protocol capabilities and feature names.
  • So that other people following from a distance or newcomers, can learn from the context of the conversation what it is about, and whether it is worth follow more closely.
  • So that other not-so-technical fedizens who get developer posts boosted to their timelines have more opportunity to participate, and get less annoyed by an overload of deep tech talk.

These messages were addressed to specific people, not intended to be read by outsiders / newcomers.

You use the fediverse, and address the unintended audience. This looks indiscernable to me from ActivityPub related discussion between federated app developers, taking place on their public medium of choice, the fediverse.