One aspect of the various types and elements that go into the tag
property is to serve as a “microsyntax” and provide rich metadata for substrings of the natural language properties, name
summary
and particularly content
. For example, a Mention
is often associated with (and defined by) the @mention
microsyntax. The Hashtag
is invoked via the #hashtag
microsyntax. And most notably, the Emoji
(http://joinmastodon.org/ns#Emoji
) is used to search-and-replace the :emoji:
microsyntax within content
(and often summary
, and sometimes name
).
Could we generalize this pattern? And if we do… does it need any other affordances?
The thing that makes a tag
“work” is often not dependent on parsing the microsyntax. This is in line with the AS2-Core spec saying to not require processing of microsyntaxes.
- An example of this is that a Mention will generate a notification (and be used for delivery in Mastodon) even without being present in the content at all; this can be visually misleading when replying to a message that appears to have no mentions, but actually contains several invisible mentions present only in the
tag
array. - Similarly, a Hashtag will cause the post to be inserted into tag feeds even if not present in the content at all; such behavior has led to confusion when an invisible hashtag is not rendered and the post appears in a timeline seemingly erroneously.
- Only the Emoji actually requires parsing the microsyntax, and this is only because inline images are disallowed and stripped/removed by Mastodon’s HTML sanitizer.
Indeed, often the microsyntax comes “pre-processed” due to the content
being HTML by default. A consumer wishing to properly associate the tag
as a rich entity with some substring of the content
will need to “un-process” the content
by similarly stripping it or plaintextifying it.
It would be nice to standardize this procedure so that it can be deterministically reproduced.