Right now, content is always by default text/html, but different clients (including server-clients like Mastodon) have support for different subsets of it. For example, some clients, like SocialHome and Friendica, support embedded images, while some do not.
Publishers would like to be able to specify fallback behaviors by explicitly federating “limited” representations, like text/plain, while consumers want to be able to gracefully handle many different systems without compromising the focus of their clients. Friendica has started federating a contentMap property with text/html to provide a more “fully featured” HTML representation.
My contention is that the best way forward is to make content as fully featured as possible, and explicitly define fallback behaviors, including text and image representations, for clients that don’t want to support those features. This would include both suggested HTML representations for things that the HTML spec doesn’t handle well natively, like block-level images, and suggested plain-text and image representations for clients that don’t want to support those features.
Does that make sense? What do people thing about this idea?
re: prior art: this sounds a lot like how email messages can be sent as text/plain and text/html. however, i’m concerned about using contentMap for this. the intention for contentMap seems to have been to allow for multiple language representations, and i’m not sure how that would interact with doing a contentMap by mimetype instead of by language.
therefore, i’d like to raise the following points:
semantically, mediaType is the property that is intended to be used for this, but this assumes content instead of contentMap. how might it be adapted to apply to objects with a map?
what realistic limits or best practices should apply to the keys of the map? the activity vocab spec uses MAY for language-tagged keys. afaik, the MAY indicates that implementations should be prepared to interoperate with contentMap containing language keys, but not anything else? the defined range is xsd:string or rdf:langString, so…
what is the recommendation for choosing which of the keys to use as content out of the map? if the keys were languages, then it would be simple, but a simple hashmap isn’t expressive enough to express both mimetype and language unless some standardized behavior were introduced to the spec in a revision.
to what extent should we prioritize or emphasize plaintext or html? note that in the email network, html messages have grown to be very annoying and mainly used for invasive ads and tracking, even leading several people to declare that html emails were a mistake and that emails should have remained plaintext. is a similar advisory prudent for activitypub? or is sanitization enough? and if sanitizing is enough, then surely only one html representation is needed per language? or are we going to see text/html and text/html-noblock?
While conversation to date has focused on, yeah, basically re-inventing multipart/alternative, I don’t think that that’s a good or sustainable way forward. Like you said—are we going to see text/html as well as text/html; noblock? I think it’s a lot more tractable to attack the problem from the other side, and define a better types of sanitization and fallback behavior that clients like Mastodon can implement.
I also don’t like the idea of using contentMap for this, but I think trying to define fallback behaviors will never realistically work - we cannot predict what clients will want to do, and any set of fallback behaviors would ultimately end up privileging a very specific subset of what it should be possible to develop, especially from the direction of trying to represent “recommended” HTML representations.
I think my preference would be to extend source instead as sourceMap which is a less serious semantic violation of intent than using contentMap for media types instead of language, and would have the added benefit of clients being able to do a limited amount of type negotiation for things like editing posts where maybe they don’t support the actual original source (like the org mode example in the spec document), but this makes the relationship between source and contentMap (for languages) even less clear, and could have complications for updating multiple source formats from a single “real” source format, where the internal content for some formats could lag behind others if the client or server doesn’t support one of the sources stored.
End of the day, I don’t think there’s any ideal options here, but I don’t think there’s any reasonable way to handle specifying fallback behaviors for full-featured HTML in content - we’d need some additional way to know which subset of HTML the thing is targeting, and even knowing that you’d implicitly also need to know what type of app it is – the structure of the HTML is always going to reflect the type of application that the HTML was generated for/by which may not apply in an app that may have an easier time coping with the result if alternative representations were provided.
The only other option I can see is using additional parameters with HTTP content negotiation to allow requesting json-ld documents with specific types in the content fields, which may also work and would avoid exploding the json representation with a bunch of types, but we’d need to pick a scheme for representing the types there, and make sure that there’s a way for clients to detect that they got back a different type than they requested if the server falls back to HTML.
sourceMap sounds like it could make sense for this, yeah. but that does raise the question of how source currently interacts with contentMap? from AP Example 8 it looks like the source object contains both content and mediaType, and this is really what i’d like to see ideally, i think – if we could declare a mediaType next to each value of the contentMap, that’d be best. maybe we could use arrays? i.e., let source be an array:
{
"@context": ["https://www.w3.org/ns/activitystreams",
{"@language": "en"}],
"type": "Note",
"id": "http://postparty.example/p/2415",
"content": "<p>I <em>really</em> like strawberries!</p>",
"source": [
{"content": "I *really* like strawberries!",
"mediaType": "text/markdown"},
{"content": "I [i]really[/i] like strawberries!",
"mediaType": "text/bbcode"}
]
}
it just seems kind of redundant as currently stated… (sidenote fwiw i’d also like to see a dedicated lang property, as we only really have hreflang for links)
really, i think the bigger issue here is that there isn’t a defined mimetype for “text/html but without block elements”. you’d have to define a parameter maybe? “text/html;inlineOnly=true”? whatever parameter name it is, it needs to be standardized.
Sure—i’m not saying we should specify that only the HTML representations we’ve agreed upon are valid. I’m just thinking of a “best practice” document for very common types of formatting, especially in cases where they’re not always well-represented by the HTML spec. (for example, tumblr has font colors as a native formatting option—should federating clients use inline styles, which are hard to sanitize, or the deprecated <font> element to represent these?)
I think an extensible spec that would provide us the framework for coming to a consensus on these types of options would be really useful right now.
End of the day, I don’t think there’s any ideal options here, but I don’t think there’s any reasonable way to handle specifying fallback behaviors for full-featured HTML in content - we’d need some additional way to know which subset of HTML the thing is targeting
Maybe we’re getting tripped up over the usage of the word “specify”? I’m picturing a spec document that talks about these things, not a way to specify these things within activitystreams itself.
I have two concerns with sourceMap:
If means that source no longer actually represents the canonical, user-authored representation of the post, and means that we will either have to sacrifice lossless editing or say that you can only provide “alternative” content for formats you can losslessly convert between (defeating the whole point of providing a fallback!)
It doesn’t work when there are more then 2 systems involved. You can federate a “mastodon-compatible” representation, but what happens now that pleroma has support for limited rich-text rendering, but without as much flexibility as friendica and socialhub?
I think it’s a lot more reasonable to say, like, “if you want to embed an inline image, you should use this format, if you don’t want to embed inline images, here’s a guideline for fallback behavior and a reference implementation.” This serves two purposes: as well as providing guidelines on fallback behavior, it allows us to explore the list rich text formats that even need fallback behavior. Again, I don’t believe any spec can cover everything someone might want to do, but a light-weight, extensible spec can help us give guidance on the right ways of federating at least some types of rich text content.