FEP-b2b8: Long-form Text

At first I was opposed to how Mastodon handles Articles, but if you define an article the traditional way (journalist article, blog post, etc.) then it makes perfect sense. If a platform cannot display the HTML in the article properly, it SHOULD link to it instead of try to display it.

And the current trend to mark things that are clearly not Articles as Articles, and then stuffing the Summary with the body of the post so it shows up on Mastodon is… a workaround that should not have happened. A Summary should not contain HTML and should remain short. After all, it is a summary, not the body of a post. (And some platforms strip all of your fancy HTML in the summary field anyway.)

All of this because there is no type for Conversation, and even if there was, some platforms would not recognize it.

I think the only solution is to use the same model email has. They send a text version and an HTML version, and the client picks what it wants to display. But in our case, we send:

  1. A Link to the resource (URL that resolves to a web page).
  2. A Summary of the resource (short, no formatting, may contain content warnings).
  3. A Note or Preview: A Simplified View (a version of the content with limited formatting).
  4. An Article: A Fully Formatted View (a version that contains HTML, which will be sanitized upon display).

If a platform does not support articles, or wishes to link to articles instead of display them, then it can use the content of the Summary or Note plus the Link to display something meaningful. Platforms that support articles can use the Summary or Note content on some displays (like the inbox or recent posts view) and the Article content on the single post page with its comments, if any.

Feel free to rename Note and Article in the example above in future specifications. I am just using terms people are familiar with.

But we need to somehow get away from everything being a note, and the misuse of article for things that are clearly not articles.

1 Like

@julian What is the purpose of preview if Mastodon can already render summary?

@silverpill@mitra.social missed your reply.

I'm not here to decide what's right or wrong, just going with consensus. In any case, a dedicated preview would allow implementors to opt in to an alternative representation that better respects the constraints supplied by Mastodon and other microblog-focused software. Things like lack of support for inline images, and the use of attachment.

summary gets you part of the way there, but Mastodon would still strip out the inline images, and I don't want to add image assets to Article in attachment because I want to promote the support for inline images for non-Notes.

1 Like

@silverpill@mitra.social specifically, though, the idea of providing a rel="alternate" would be more appropriate than using preview. (cc @trwnh@mastodon.social)

What that ends up looking like is to be determined, but I am optimistic.

@wistex@socialhub.activitypub.rocks why a link when you can set url?

@julian

summary gets you part of the way there, but Mastodon would still strip out the inline images, and I don't want to add image assets to Article in attachment because I want to promote the support for inline images for non-Notes.

If Mastodon can display summary, why inline images are a concern? Summary with a link should be enough for previewing.

@silverpill@mitra.social for me, summary is a stop-gap until a proper alternative representation is agreed-upon. content in Article is unusable at the moment due to aforementioned restrictions (even more so than content in Note).

Don't worry @scott@authorship.studio, stuffing the whole post body into summary isn't a "trend" you should be worried about.

One of the problems with using summary for a preview is that some platforms, like Hubzilla, strip out all of the HTML. And by doing that, is it really a preview anymore?

@julian By the way, I was not criticizing you personally. The entire ActivityPub situation is messed up where we have to do things that make no sense so they are compatible with certain systems.

And if putting the whole post in summary was not intended, then there is a bug, because we are receiving the body of the post as both the summary and the body fields.

A URL would be better, but in the case of a blog post, most people would expect it to deliver them to the original blog post. What WordPress would call a permalink.

Depends on what is being stripped. Does it completely remove all HTML content, including text paragraphs, or just sanitizes it?

ActivityStreams vocabulary defines summary as "A natural language summarization of the object encoded as HTML":

https://www.w3.org/TR/activitystreams-vocabulary/#dfn-summary

I think that means images and other media shouldn't appear in summary, because they are not natural language.

There is a subtle distinction between using a URL that serves as a unique identifier and a URL that directs you to the source rendered as a web page.

When I said link, I was referring to something that resolves to a web page, typically showing the blog post or forum thread or social media post. Something you can put in the UI that users click on.

1 Like

As far as I can tell as a user, Hubzilla currently treats summary as plain text.

And I don’t think Hubzilla expected HTML, since we currently have this weird situation where it converts all incoming text to BBCode, but then does not parse the BBCode for the summary field. HTML in the summary appears to be an unexpected use case for Hubzilla.

I am not sure what Mario Vati (the head developer) plans to do about the situation. I saw a post where he was trying to get clarification of why we are receiving HTML in the summary from some platforms.

This is what we receive in the summary field and how it is displayed. You have to click on “View Article” to see the body of the post (which in this case is the formatted version).

2 Likes

FYI since this was more tangential, I brought the topic for discussion at Social coding commons:

This FEP suggests providing both an HTML content and a original-format source for an Article.

When federating an Article object to other servers, should implementations include both the source (e.g. Markdown text with its mediaType) and the HTML content in the ActivityPub object?

If yes, what is the expected handling on the receiving side - are servers or clients just meant to use the HTML for display and treat the source as metadata (with no need to verify or regenerate the HTML from it)?

In short how the receiving end is supposed to use source if any?

The only possible use I see for source is if I want to quote some part of the original text in its original source format, providing the receiving software supports this original format as well.

And if there is no use then why do we need to send it?

1 Like

As an implementor ingesting and sending markdown in source, I use the markdown exclusively when converting the ActivityStreams object to a NodeBB post.

The reasoning is simple, markdown is cleaner. You can't sneak in random things like errant classes or custom tags because it will be ignored by my markdown parser.

Theoretically you could also use the source markdown if you want to edit your local representation of remote content... I haven't added that in but you could do that.

Just to clarify. Do you send the content only as markdown in source.content?
Or you also generate html from the markdown and send it out in content?

I send the rendered HTML out in content as well

Got it, thanks for the clarification.

Do you guys fill out preview and/or summary as well?
If you fill out the summary, do you use any special algorithm to extract it from the content? I meant something more complicated than simple cut first 500 or so characters