Using tags for microsyntax

One aspect of the various types and elements that go into the tag property is to serve as a “microsyntax” and provide rich metadata for substrings of the natural language properties, name summary and particularly content. For example, a Mention is often associated with (and defined by) the @mention microsyntax. The Hashtag is invoked via the #hashtag microsyntax. And most notably, the Emoji (http://joinmastodon.org/ns#Emoji) is used to search-and-replace the :emoji: microsyntax within content (and often summary, and sometimes name).

Could we generalize this pattern? And if we do… does it need any other affordances?

The thing that makes a tag “work” is often not dependent on parsing the microsyntax. This is in line with the AS2-Core spec saying to not require processing of microsyntaxes.

  • An example of this is that a Mention will generate a notification (and be used for delivery in Mastodon) even without being present in the content at all; this can be visually misleading when replying to a message that appears to have no mentions, but actually contains several invisible mentions present only in the tag array.
  • Similarly, a Hashtag will cause the post to be inserted into tag feeds even if not present in the content at all; such behavior has led to confusion when an invisible hashtag is not rendered and the post appears in a timeline seemingly erroneously.
  • Only the Emoji actually requires parsing the microsyntax, and this is only because inline images are disallowed and stripped/removed by Mastodon’s HTML sanitizer.

Indeed, often the microsyntax comes “pre-processed” due to the content being HTML by default. A consumer wishing to properly associate the tag as a rich entity with some substring of the content will need to “un-process” the content by similarly stripping it or plaintextifying it.

It would be nice to standardize this procedure so that it can be deterministically reproduced.

1 Like