Link: https://codeberg.org/fediverse/fep/src/branch/main/fep/9098/fep-9098.md
Summary
This document describes how custom emojis are implemented in ActivityPub network.
Link: https://codeberg.org/fediverse/fep/src/branch/main/fep/9098/fep-9098.md
This document describes how custom emojis are implemented in ActivityPub network.
Edit: I tried to post that as separate replies, one per section, to make it easier to reply to, but the forum doesn’t let you do that. So this is going to be one big reply.
Useful FEP, thanks. I’ve recently been improving emoji support in Pachli, so this caught my eye and I have some feedback based on that experience.
id
property. In the example it’s a URI – is this required, or can it be any (possibly non-unique) string? The “Uniqueness” section says the domain can be extracted from the id
property (if present) which suggests the format might be a URI, but it’s not definitive.alternateName
property, marked as “RECOMMENDED”? The “Accessibility” section could then be removed.updated
property, RFC-3339 allows implementations to make a few choices about the format. Are all choices valid, or should servers agree on those choices? I wrote Datetime formats - Mastodon documentation (see the “Interoperability” section) to describe this for Mastodon, feel free to use any of that that’s helpful.social.example.com:blobcat:
(or the other way around, :blobcat:social.example.com
)? Or is this something internal to the server, and the ID created in this way is never exposed externally, so the server can do it however it wants?id
is not guaranteed to be globally unique, it MUST be omitted”. This implies that if id
is present then it is globally unique. Given that, why is the id
property (if present) not the unique identifier?
id
property?icon
) “The image should have an aspect ratio of one (horizontal) to one (vertical)”. Maybe this FEP should note that image
would be the better property to use, but icon
is used for historical reasons?icon
is “Image
object describing emoji image”. Does “Image” here mean Image: Activity Vocabulary?name
, summary
, and content
properties. Maybe show three different examples, one for each?Emoji
objects are added to the tag
array of the object.” – what requirements are there about the order the emoji appear in tag
and the order they appear in the content? Do they have to be the same, or can it be any order (i.e,. tag
is treated more like a set than an array here)?img
reference in the content? If not, this section should be stronger, something like “Custom emojis MUST be inserted into…”tag
?:blobcat:
and not the emoji, how do I do that? :blobcat:
?alternateName
here?alternateName
(if present), falling back to name
”, but a specific recommendation here would be nice.Note
examples with attached emoji. There’s one already in the Microsyntax section, but (as earlier too) I think minimal and maximal examples would be helpful. For example, show a Note
with an id
containing an Emoji
without an id
, to clearly show what the emoji’s final unique identifier would be.Note
that references an emoji that’s missingNote
with an escaped emoji referenceNote
with multiple emojis, where the tag
property lists them out of order (if that’s legal)Actor
and the summary
property.How do emoji shortcodes work in RTL text?
I suspect it’s (accidentally?) not a problem because of the recommendation to only use alphanumerics for emoji names, but it might be worth calling out explicitly that in RTL text (e.g., Hebrew) the emoji shortcode is still written LTR.
So it’s:
טקסט RTL עם :blobcat: מוטמע בטקסט
(“Some RTL text with a :blobcat: embedded in the text”) and not
טקסט RTL עם :tacbolb: מוטמע בטקסט
(note: this site seems to render that wrong – the Hebrew text above should be right justified and run RTL instead of LTR).
Something else that occurred to me re the microsyntax that should be clarified. Does there need to be a word boundary character on the outside of the colons that delimit the shortcode?
In other words, does this render two emojis or no emojis?
:blobcat::blobcat:
?
If there does need to be at least one word boundary character between them, what are the word boundary characters?
Added the definition to “Summary” section.
ActivityPub object identifiers are URIs: https://www.w3.org/TR/activitypub/#obj-id
I didn’t want to make this property RECOMMENDED because it is not widely supported (AFAIK, only Fedibird uses it). Once it gets more adoption, I will add it to the main list of properties.
I replaced “timestamp” with “date and time string”, which is less ambiguous. All date-and-time formats specified in RFC-3339 are allowed, but if there are known interoperability issues, we can mention them in the “Compatibility” section.
Maybe later.
There are two identifiers: id
and the combination of name and domain. Their uniqueness differs between implementations, and in my experience the latter is more reliable.
In that sentence, “name” is emoji name, not its name
(which is actually a shortcode).
It is supposed to be used internally, yes. I store emoji names and domain names in different database columns, with a two-column unique constraint.
ActivityPub requires id
to be globally unique, but some implementations may produce different emojis with the same id
.
The statements in this section are recommendations (SHOULD).
I added another recommendation to the “Rendering” section, which hopefully makes situation clearer: “The aspect ratio of the image SHOULD be preserved.”
I added a note to “Emoji object” section.
Yes.
Maybe later.
The order is not important for custom emojis, but might be important for other tags.
I haven’t seen them being inserted as img
tags. But “MUST be inserted” doesn’t feel appropriate here. Custom emojis can be inserted or not – this is not a requirement.
The server should ignore the shortcode. Are there other options?
Shortcodes shouldn’t be replaced inside <code>
and <pre>
elements (an allowlist might be a better choice, though). I added this to the “Rendering” section.
:
and :
may also be used.
Could you provide examples of non-web clients? How do they display HTML?
Yes, alternateName
is a possible source for the description.
It doesn’t refer to specific properties of Emoji
.
I added the list of characters to the “Rendering” section.
I think this should be treated as implementation details.
Maybe later.
I don’t know.
My software requires a whitespace around a shortcode, but I don’t know what others do.
@silverpill Thanks for the update, but I think it still leaves questions unanswered. In particular, because some of the answers are included in this thread rather than in updates to the FEP.
Rather than go back and forth over text like this I thought it might be easier if I sent some concrete suggested changes in the form of a PR which you can look over and decide if you want to accept some or all of them.
If I didn't add an answer to the FEP, that's because I think it shouldn't be added. Either because it is a minor implementation detail or because I don't know yet what the best practice is.
It would have been better to continue discussion here. I am not sure if it makes sense to repeat my answers on Codeberg, maybe I will just cherry-pick some of your suggestions later.
In any case, thank you for the feedback.
If I didn’t add an answer to the FEP, that’s because I think it shouldn’t be added. Either because it is a minor implementation detail or because I don’t know yet what the best practice is.
Respectfully, I disagree with this stance when it comes to writing documentation intended to improve interoperability.
I don’t think there are “minor implementation details” here. If two implementations do not agree on something like “How does :blobcat::blobcat:
render?” then users are going to have a bad time. And ultimately we’re doing this so users have a good time.
Note: I’ve just discovered there’s more nuance to this. This post content: “Here’s a :blobcat:. Here are two more :blobcat::blobcat:” renders all three custom emoji on (at least) Mastodon and Pleroma. However, the post content “Here are two emojis: :blobcat::blobcat:” doesn’t render either of them. This is Inconsistent custom emoji shortcode behaviour when lacking whitespace · Issue #7364 · mastodon/mastodon · GitHub.
Similarly, it’s also OK in documents like this to explicitly call out where best practices are either uncertain or evolving. That allows the reader to know where the gaps are (instead of trying to infer them from context) and provides an opportunity for others to do the work to flesh out those gaps in subsequent updates to the FEP.
That’s what I’ve done here – rather than expecting you to do the work to figure out some of this stuff, I went and figured out the answers. I could then tell you those answers, and expect you to update the FEP. But it’s much more efficient to write those down and send you specific proposals to update the FEP that include the information.
I don’t expect you to accept the PR as is. But obviously I do think that information like:
are all important to cover, both to inform current implementations and to guide future improvements.
Anyway, that’s my philosophy on this sort of stuff. Hope it’s helpful.
@nikclayton I reviewed your PR and copied several suggestions to the FEP: https://codeberg.org/fediverse/fep/pulls/661.
I also left several comments on your PR (regarding RTL, non-web clients and repeated emojis).
Some suggestions were not copied because I need to do additional research first. In particular, I don't want to include anything related to alternateName
until I implement it myself.
Detailed examples will be added later.
I came across a good example of the interoperability concerns today when people boosted :petthex_javasparrow:しゅいろ:petthex_javasparrow: (@syuilo) | Misskey.io in to my timeline.
Here’s what that looks like on the original server:
Note the emojis in the name (above the @syuilo
) and the wide array of emojis at the bottom, many of which are decidedly non-square.
Here’s how that renders in Mastodon:
The emojis at the bottom are missing (they’re emoji reactions, not supported by Mastodon, so that’s understandable). But note that the emojis from the display name are missing too, and this renders as :petthex_javasparrow:
.
That sounds like it could be a bug in Mastodon’s emoji code..
Maybe. It’s not (directly) the display code – I don’t know what the original message looked like when it was received over AP, but when Mastodon serves the status using the Mastodon API it has an empty emojis
property, so I infer that Mastodon didn’t detect any emojis on the incoming message.
Okay, so Mastodon is fetching the custom emoji, however, the regex for custom emojis within the display name (and other fields) is not finding the emojis due to a regexp issue.
:petthex_javasparrow:しゅいろ:petthex_javasparrow:
Fails to find the emojis, but:
:petthex_javasparrow: しゅいろ :petthex_javasparrow:
Finds the emojis
This appears to be an issue with the boundary matching on the regular expression that finds custom emojis in text. I’m not sure what the FEP says with regards to using custom emojis and whitespace between the emoji shortcode and the surrounding text.
Edit: This would be almost certainly a duplicate of this issue, which is called out in the FEP: Inconsistent custom emoji shortcode behaviour when lacking whitespace · Issue #7364 · mastodon/mastodon · GitHub
Here’s a bit more on this custom emoji extraction logic and why it appears to not extract multiple emojis, but sometimes seems to work: Inconsistent custom emoji shortcode behaviour when lacking whitespace · Issue #7364 · mastodon/mastodon · GitHub
The FEP currently says:
Within a repeated run of emojis (e.g.,
:blobcat: :blobcat: :blobcat:
) each shortcode is separated by at least one character that is not in the case-insensitive seta-z0-9:
However, Mastodon uses [:alnum:]:
, which also includes unicode alphanumeric characters, if I am understanding the documentation correctly:
https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-POSIX+Bracket+Expressions
That would explain why shortcodes separated by "しゅいろ" are not rendered.
@silverpill @thisismissem @feps
Your last sentence is rendered in Mastodon web UI as:
> That would explain why shortcodes separated by "しゅいろ" are not rendered.
Updating the FEP: https://codeberg.org/fediverse/fep/pulls/663
The new text is
Shortcode is placed between two characters that are not unicode alphanumerics, colons or line endings
@smallcircles It appears as intended on your screenshot (these characters are from @syuilo display name, which we discussed earlier).
Ah, thank you, was unaware of that and thought I saw some encoding issue.