Making the case for richer HTML in ActivityPub

Tonight I set aside some time to listen to @johnonolan@mastodon.xyz on @mike@flipboard.social's DotSocial podcast.

A lot a lot a lot of what John says mirrors the very same potential that many ActivityPub devs see as well. There are far too many points in that podcast that made me nod my head in agreement (and wish I was a third guest too!), but there was one that was incredibly timely:

Mike: ... you've been thinking about actually embedding the whole article in the ActivityPub post, which is a mind-blowing thing... it's not a link to something else... the whole article is in the post.
John: Yes, this is something that makes perfect sense but is somehow completely new, which is weird...
Mike: You can have formatted text... images? video?
John: ActivityPub is fairly agnostic, you could in theory shove almost anything into it. The question is what is the client on the other side prepared to receive? Do they have some way to display it?
John: If we get platforms in the ActivityPub network to start innovating with content types, it might cause those things to be adopted and it might drive the standard and what it is possible to display

Emphasis mine.

John, Mike, this is almost word-for-word exactly what the Forum and Threaded Discussions working group has been working towards! The main problem is we need buy-in from implementers to push this forward.

We can do this, we can send richer HTML across the protocol in such a way that all those things you two mentioned — in-line images, embedded videos, tables, etc. — can all show up as intended by the sender.

We've got commitment from (but not limited to) representatives from NodeBB, Discourse, and WordPress, and having Ghost and Flipboard sign on would help push this forward just that much more.

Let's do this, let me get you caught up with the state of the protocol re: the Article object type. Let's chat (but publicly, since I can't receive DMs here on NodeBB).

8 Likes

It really seems like @Mike McCue's perception of the Fediverse is Flipboard, Mastodon and nothing else.

Posting long-form articles in the Fediverse with almost the full HTML feature set from text formatting to tables to any number of embedded in-line images is completely, utterly inconceivable to many Mastodon users. But this is nothing that is only just being introduced right now. It has been done in the Fediverse since 2010, almost six years before Mastodon was launched, when Rochko was still a school kid, and Friendica which introduced this was still Mistpark.

However, Mastodon users don’t notice this happening. They can’t. Mastodon flat-out refuses to support full HTML formatting, in-line images etc. simply because that’s not what a Twitter clone should do. Thus, Article-type objects are reduced to links to the original which Mastodon users don’t even perceive as still within the Fediverse because it doesn’t directly show up in their timelines.

And Note-type objects are butchered by Mastodon’s “sanitiser” almost beyond recognition. Even images are still fully being stripped out, which is why e.g. Pleroma, Akkoma, Friendica, Hubzilla and (streams) resort to converting embedded in-line images into file attachments. And then Mastodon only keeps a maximum of four of these and throws the rest away.

This wouldn’t matter much if it weren’t for Mastodon having a market share of 65% which feels more like 95%. It’s absolutely possible for Mastodon users to have been around since late October, 2022 and still “know” that the Fediverse is only Mastodon.

2 Likes

It really seems like @Mike McCue's perception of the Fediverse is Flipboard, Mastodon and nothing else.

Posting long-form articles in the Fediverse with almost the full HTML feature set from text formatting to tables to any number of embedded in-line images is completely, utterly inconceivable to many Mastodon users. But this is nothing that is only just being introduced right now. It has been done in the Fediverse since 2010, almost six years before Mastodon was launched, when Rochko was still a school kid, and Friendica which introduced this was still Mistpark.

However, Mastodon users don’t notice this happening. They can’t. Mastodon flat-out refuses to support full HTML formatting, in-line images etc. simply because that’s not what a Twitter clone should do. Thus, Article-type objects are reduced to links to the original which Mastodon users don’t even perceive as still within the Fediverse because it doesn’t directly show up in their timelines.

And Note-type objects are butchered by Mastodon’s “sanitiser” almost beyond recognition. Even images are still fully being stripped out, which is why e.g. Pleroma, Akkoma, Friendica, Hubzilla and (streams) resort to converting embedded in-line images into file attachments. And then Mastodon only keeps a maximum of four of these and throws the rest away.

This wouldn’t matter much if it weren’t for Mastodon having a market share of 65% which feels more like 95%. It’s absolutely possible for Mastodon users to have been around since late October, 2022 and still “know” that the Fediverse is only Mastodon.

@jupiter_rowland I’m curious if you actually posted twice or if your post appearing twice is a bug in one of our implementations.

I agree with you Julian. I think we’ll need to tackle this piece by piece, but I’m on-board with the vision. I know this is what the users of Discourse want (they’ve been telling me!).

I wonder whether the most extensible way to do this is by sending markdown in addition to the HTML we already send. Markdown

  • has canonical (word of the day) markup for media objects (e.g. we’ll all send images as ![image](https://imageurl.com))
  • is more secure than HTML
  • is (mostly?) already supported by the platforms who’d want to federate with each other for this richer content.

If we put an additional markdown representation which included the richer content we’d also avoid the HTML being rejected or subject to parsing we don’t control on the grounds it contained unsupported tags or overly-complex structures.

@julian @mike @johnonolan @mike As other commenters have noted, embedding whole articles is not really new, they are just not rendered properly by Mastodon. However, I don't think anyone have tried to embed an interactive application:

https://webxdc.org/docs/get_started.html

@julian @mike @johnonolan @mike As other commenters have noted, embedding whole articles is not really new, they are just not rendered properly by Mastodon. However, I don't think anyone have tried to embed an interactive application:

https://webxdc.org/docs/get_started.html

@julian @mike @johnonolan @mike As other commenters have noted, embedding whole articles is not really new, they are just not rendered properly by Mastodon. However, I don't think anyone have tried to embed an interactive application:

https://webxdc.org/docs/get_started.html

I think it’s a bug on the Discourse side.

@angus@socialhub.activitypub.rocks said in Making the case for richer HTML in ActivityPub:

I wonder whether the most extensible way to do this is by sending markdown in addition to the HTML we already send.

NodeBB already sends markdown through, although I'm not sure whether sending different markdown from the HTML in content is the right approach (something something two wrong don't make a right)

@johnonolan@mastodon.xyz is right about one thing, which is that if a critical mass expanded support, then other implementers would likely follow suit. It's always been this way, but usually it's Mastodon that has gained, and everybody else who's has to adapt.

I see two paths forward:

  1. Compromise (@mikedev@fediversity.site suggested this years ago) — as:Note contains a limited subset of HTML (and images are attached), as:Article contains the richer set.
    • Downsides: it's a technically contrived distinction as opposed to one based on the kind of content relative to others (e.g. content is an article because it contains an image, vs content is an article because it is considered a standalone work)
  2. Ignore — send as:Article if we want to, and send the richer HTML set because we can.
    • Downside: End users don't care, they just see that their post content doesn't make it through to Mastodon and complain

As much as I dislike the "best viewed in" phrase, Mastodon would ideally need to show an info label instructing users to go to the original site, or render richer HTML outside of their feed (like in a modal).

2 Likes

@jupiter_rowland For the record my perception of the fediverse is that it is the open web of all people, content, apps, services and websites that have federated or bridged with ActivityPub.

@jupiter_rowland For the record my perception of the fediverse is that it is the open web of all people, content, apps, services and websites that have federated or bridged with ActivityPub.

@jupiter_rowland For the record my perception of the fediverse is that it is the open web of all people, content, apps, services and websites that have federated or bridged with ActivityPub.

@julian @johnonolan I’d love to learn more about this!

@julian @johnonolan I’d love to learn more about this!

@julian @johnonolan I’d love to learn more about this!

@julian @johnonolan @mike the origins of ActivityPub is Atom, which had the name/summary/content split, and we preserved that through the various Activity Streams iterations, from Atom to JSON to the current JSON plus LD.
Mastodon undermines this by ignoring post names and treating summary as a Content Warning.

@julian @johnonolan @mike it still baffles me that the content encoding standards aren’t the same as email. The spec is already there, why not use it?

@julian @johnonolan @mike
Would be remiss to not mention that Hometown, which is a "light" Mastodon fork supports the Article post type and has no issues rendering long form.
https://github.com/hometown-fork/hometown?tab=readme-ov-file#reading-more-content-types