FEP-9098: Custom emojis

Link: https://codeberg.org/fediverse/fep/src/branch/main/fep/9098/fep-9098.md

Summary

This document describes how custom emojis are implemented in ActivityPub network.

2 Likes

Edit: I tried to post that as separate replies, one per section, to make it easier to reply to, but the forum doesn’t let you do that. So this is going to be one big reply.

Useful FEP, thanks. I’ve recently been improving emoji support in Pachli, so this caught my eye and I have some feedback based on that experience.

General

  • The FEP doesn’t define what a “custom emoji” is, and how it might differ from a Unicode emoji. That would be helpful.

The “Emoji object” section

  • There’s no description of the content of the id property. In the example it’s a URI – is this required, or can it be any (possibly non-unique) string? The “Uniqueness” section says the domain can be extracted from the id property (if present) which suggests the format might be a URI, but it’s not definitive.
  • If I’ve understood the “Accessibility” section correctly, shouldn’t this section (and example) also include the alternateName property, marked as “RECOMMENDED”? The “Accessibility” section could then be removed.
  • For the updated property, RFC-3339 allows implementations to make a few choices about the format. Are all choices valid, or should servers agree on those choices? I wrote Datetime formats - Mastodon documentation (see the “Interoperability” section) to describe this for Mastodon, feel free to use any of that that’s helpful.
  • Maybe show two examples here, one being minimal (required properties only) and one being maximal (all properties)?

The “Uniqueness” section

  • This starts “The primary unique identifier…”. This implies there may be other unique identifiers. Are there? If not, consider dropping the “primary” from that text to remove the ambiguity. If there are other identifiers then what are they?
  • For additional clarity, in the text “… combination of its name and the …” please consider replacing “name” with “name property”, and rendering “name” in monospace for consistency with the rest of the document.
  • In “combination of its name and the domain name” this doesn’t explain how they are combined. Is this simple string concatenation (e.g., in the examples given the unique ID would be social.example.com:blobcat: (or the other way around, :blobcat:social.example.com)? Or is this something internal to the server, and the ID created in this way is never exposed externally, so the server can do it however it wants?
  • The text says “If id is not guaranteed to be globally unique, it MUST be omitted”. This implies that if id is present then it is globally unique. Given that, why is the id property (if present) not the unique identifier?
    • Also, would this requirement to be globally unique be better placed in the “Emoji object” section, which defines the id property?

The “Compatibility” section

  • Re “The image is a square” requirement – I disagree that that should be a requirement. If a client fails to display non-square emojis correctly then that is a client bug and the client should be fixed. Non-square emojis definitely exist in the wild, and seem to be more common in non-Western cultures. I think it would be better if this section explicitly called out rectangular emojis as existing so that client developers are more aware of this (I just fixed emoji rendering bugs in Pachli related to this, so it’s top of mind for me).
    • misskey.io (the largest Misskey server according to fedidb.com) contains literally hundreds of rectangular emojis (Misskey.io, in the “Letters / Japanese” sections)
    • The Pleroma PR mentioned earlier in the FEP contains a rectangular emoji.
    • 깡통다요 (@candayo) | hotomoe is an example post with a rectangular emoji.
    • Note: All of the above conflicts with Icon: Activity Vocabulary which says (for icon) “The image should have an aspect ratio of one (horizontal) to one (vertical)”. Maybe this FEP should note that image would be the better property to use, but icon is used for historical reasons?
    • Note: This is more confusing because the description for icon is “Image object describing emoji image”. Does “Image” here mean Image: Activity Vocabulary?

The “Microsyntax” section

  • This talks about the name, summary, and content properties. Maybe show three different examples, one for each?
  • In “The corresponding Emoji objects are added to the tag array of the object.” – what requirements are there about the order the emoji appear in tag and the order they appear in the content? Do they have to be the same, or can it be any order (i.e,. tag is treated more like a set than an array here)?
  • The text “Custom emojis can be inserted into…” reads as though it’s optional. Could a custom emoji be inserted as a literal img reference in the content? If not, this section should be stronger, something like “Custom emojis MUST be inserted into…”
  • What should a server do if it receives an object where the content references an emoji shortcode but the matching emoji is not included in tag?
  • How are shortcodes escaped? Suppose I want to write something like “The blobcat emoji is written using the shortcode :blobcat:” and have that appear as the literal string :blobcat: and not the emoji, how do I do that? :blobcat: ?

The “Rendering” section

  • This is very web client specific. Non-web clients exist. Maybe call this “Web rendering” or “Browser rendering” or something like that?
  • Re the text “Emoji names, descriptions, URLs, and other strings that are used in replacements…”
    • What’s an emoji description? It’s not one of the listed properties, do you mean alternateName here?
    • More generally, if this text refers to specific properties it should use those property names and mark them up as monospace.
  • In “Reserved HTML characters …”, the reserved HTML characters aren’t defined or linked to.
  • What are the recommendations for client behaviour if the content references a shortcode (unescaped, per my earlier question) but the emoji is missing from the object or the user has turned off image display in their client? I suspect the answer is “Display the alternateName (if present), falling back to name”, but a specific recommendation here would be nice.

The “Examples” section

  • There isn’t one, but I think it might be helpful to show a couple of complete Note examples with attached emoji. There’s one already in the Microsyntax section, but (as earlier too) I think minimal and maximal examples would be helpful. For example, show a Note with an id containing an Emoji without an id, to clearly show what the emoji’s final unique identifier would be.
  • Some other useful examples might be:
    • Note that references an emoji that’s missing
    • Note with an escaped emoji reference
    • Note with multiple emojis, where the tag property lists them out of order (if that’s legal)
    • Something that’s not a note that contains an emoji. For example, an Actor and the summary property.

RTL text

How do emoji shortcodes work in RTL text?

I suspect it’s (accidentally?) not a problem because of the recommendation to only use alphanumerics for emoji names, but it might be worth calling out explicitly that in RTL text (e.g., Hebrew) the emoji shortcode is still written LTR.

So it’s:

טקסט RTL עם :blobcat: מוטמע בטקסט

(“Some RTL text with a :blobcat: embedded in the text”) and not

טקסט RTL עם :tacbolb: מוטמע בטקסט

(note: this site seems to render that wrong – the Hebrew text above should be right justified and run RTL instead of LTR).

3 Likes

Something else that occurred to me re the microsyntax that should be clarified. Does there need to be a word boundary character on the outside of the colons that delimit the shortcode?

In other words, does this render two emojis or no emojis?

:blobcat::blobcat:

?

If there does need to be at least one word boundary character between them, what are the word boundary characters?

Added the definition to “Summary” section.

ActivityPub object identifiers are URIs: https://www.w3.org/TR/activitypub/#obj-id

I didn’t want to make this property RECOMMENDED because it is not widely supported (AFAIK, only Fedibird uses it). Once it gets more adoption, I will add it to the main list of properties.

I replaced “timestamp” with “date and time string”, which is less ambiguous. All date-and-time formats specified in RFC-3339 are allowed, but if there are known interoperability issues, we can mention them in the “Compatibility” section.

Maybe later.

There are two identifiers: id and the combination of name and domain. Their uniqueness differs between implementations, and in my experience the latter is more reliable.

In that sentence, “name” is emoji name, not its name (which is actually a shortcode).

It is supposed to be used internally, yes. I store emoji names and domain names in different database columns, with a two-column unique constraint.

ActivityPub requires id to be globally unique, but some implementations may produce different emojis with the same id.

The statements in this section are recommendations (SHOULD).

I added another recommendation to the “Rendering” section, which hopefully makes situation clearer: “The aspect ratio of the image SHOULD be preserved.”

I added a note to “Emoji object” section.

Yes.

Maybe later.

The order is not important for custom emojis, but might be important for other tags.

I haven’t seen them being inserted as img tags. But “MUST be inserted” doesn’t feel appropriate here. Custom emojis can be inserted or not – this is not a requirement.

The server should ignore the shortcode. Are there other options?

Shortcodes shouldn’t be replaced inside <code> and <pre> elements (an allowlist might be a better choice, though). I added this to the “Rendering” section.

&colon; and &#58; may also be used.

Could you provide examples of non-web clients? How do they display HTML?

Yes, alternateName is a possible source for the description.

It doesn’t refer to specific properties of Emoji.

I added the list of characters to the “Rendering” section.

I think this should be treated as implementation details.

Maybe later.

I don’t know.

My software requires a whitespace around a shortcode, but I don’t know what others do.