Resolving the Note vs. Article distinction

trwnh · November 1, 2019, 7:56pm

Background

Note: Represents a short written work typically less than a single paragraph in length.
Article: represents any kind of multi-paragraph written work.

Example 48 (Article):

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Article",
  "name": "What a Crazy Day I Had",
  "content": "<div>... you will never believe ...</div>",
  "attributedTo": "http://sally.example.org"
}

Example 53 (Note):

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "name": "A Word of Warning",
  "content": "Looks like it is going to rain today. Bring an umbrella!"
}

Semantically, the difference is never explicitly defined (how do you define a “paragraph”?), so the current fediverse has sort of assumed Article should be viewed natively on the remote website, while Note can be displayed as an inline status. Thus, Note is used to represent a status update, and a lot of the network just defaults to Note. The distinction is assumed to be formatting, but once again this is not an explicit definition (how do you define “formatting”?)

Disambiguation

Going purely from the Activity Vocabulary descriptions and examples, I would possibly assume one or both of the following:

Note SHOULD be plain text, Article SHOULD use HTML (or should these be a MUST?)
Note SHOULD NOT use newlines (but are technically allowed to do so)

However, there is ActivityPub 3.3, Example 8:

{
  "@context": ["https://www.w3.org/ns/activitystreams",
               {"@language": "en"}],
  "type": "Note",
  "id": "http://postparty.example/p/2415",
  "content": "<p>I <em>really</em> like strawberries!</p>",
  "source": {
    "content": "I *really* like strawberries!",
    "mediaType": "text/markdown"}
}

This example Note uses HTML for its content, in order to demonstrate the source property.

Also, ActivityPub Example 4:

{"@context": "https://www.w3.org/ns/activitystreams",
 "type": "Create",
 "id": "https://chatty.example/ben/p/51086",
 "to": ["https://social.example/alyssa/"],
 "actor": "https://chatty.example/ben/",
 "object": {"type": "Note",
            "id": "https://chatty.example/ben/p/51085",
            "attributedTo": "https://chatty.example/ben/",
            "to": ["https://social.example/alyssa/"],
            "inReplyTo": "https://social.example/alyssa/posts/49e2d03d-b53a-4c4c-a95c-94a6abf45a19",
            "content": "<p>Argh, yeah, sorry, I'll get it back to you tomorrow.</p>
                        <p>I was reviewing the section on register machines,
                           since it's been a while since I wrote one.</p>"}}

This example Note uses two <p> elements, representing two short paragraphs (once again not “less than a single paragraph”).

So even the specs themselves are inconsistent on any distinction.

How much does this actually matter?

Arguably not much, since implementations often convert Note and Article into their own internal schema for statuses anyway. But it could still be beneficial to set a clearer distinction going forward on how these types should be assigned, ideally.

lanodan · November 2, 2019, 2:28pm

The distinction I make between Article and Note isn’t related directly to it’s content but on how it’s supposed to be presented and used, Articles are more things for blogs where you have about a post per day and so articles should be easy to find back with a list of articles/tags and maybe a bit of search features, Notes are more stuff like microblogging where you can easily have hundreds in a day and aren’t that easy to find back even with good keywords in full text search.

Also I find that formatting is actually very useful for notes because it allows to express more/equivalent with less (like a word-list vs a paragraph).

This question goes a lot in the fediverse because they are the two mainly used activity types but one could also ask about the actor distinction between Organisation and Group, Application and Service. And so far I’ve only seen ActivityPub Document be used in the wild for Images with textual description (like if Image couldn’t have it in the first place), but Document has no inherent meaning either.

How much does this actually matter?

Arguably not much, since implementations often convert Note and Article into their own internal schema for statuses anyway. But it could still be beneficial to set a clearer distinction going forward on how these types should be assigned, ideally.

Note: Pleroma keeps the distinction between Articles and Notes internally, no real differencies for the Mastodon API though but there could be a query filter.

trwnh · November 3, 2019, 6:23pm

For prior art, I can think of semantic HTML’s <article> being a section of HTML that can be reproduced elsewhere in its entirety.

W3Schools:

The <article> tag specifies independent, self-contained content.

An article should make sense on its own and it should be possible to distribute it independently from the rest of the site.

Potential sources for the element:
Forum post
Blog post
News story
Comment

MDN:

The HTML <article> element represents a self-contained composition in a document, page, application, or site, which is intended to be independently distributable or reusable (e.g., in syndication). Examples include: a forum post, a magazine or newspaper article, or a blog entry.

This could be a distinction worth making, maybe? that an Article should roughly map to an <article>, whereas a Note is just arbitrary text?

Those distinctions at least seem clearer to me. There’s still ambiguity but much less:

Org = “Acme Inc”
Group = “Persons working for Acme Inc”
Application = “ChessBot”
Service = “Acme Mailing List / Relay”

And those arguably matter even less, because the relevant bit is that it’s an Actor. You could use Profile, or really anything else – the only functional thing you should care about is whether it has inbox/outbox (and maybe the other optional collections). So an Actor is defined by those properties, but Note/Article are not really defined by anything except really vague semantic advisory about what a “paragraph” is.

Practically speaking, though… if there indeed isn’t a big difference between Article and Note, then I’m kind of worried about what side effects one can expect when federating out content and having to choose a type. The only tangible thing I’ve seen in existing implementations that opt for Article rather than Note is that they support the use of URL slugs.

spider · November 26, 2019, 5:34am

A growing number of projects support rich markup and are using Article for any HTML construct which is going to be purified away in Mastodon and “significant” information/presentation lost if it was downgraded to a Note. Using an Article is the only way these posts can be transmitted with the original context and layout intact.

lanodan · November 26, 2019, 6:30am

[2019-11-26 05:44:11+0000] Spider Jones via SocialHub:

A growing number of projects support rich markup and are using Article for any HTML construct which is going to be purified away in Mastodon and “significant” information/presentation lost if it was downgraded to a Note. Using an Article is the only way these posts can be transmitted with the original context and layout intact.

No.

Mastodon behaviour with Articles is to transform them to statuses with the title and a link, with Notes they ignore the title on status transformation IIRC.
Meanwhile Pleroma has good support for both Notes and Articles and basically every fediverse implementation except vanilla mastodon (alias tootsuite/mastodon) allows more rich-text than links+paragraphs in Notes.

spider · November 26, 2019, 7:18am

Correct. A link to the article on a system that will display it is a very poor solution but it is still better than quietly stripping information with no indication or warning. We carry a lot of content that is quotes and codeblocks and formatted articles with inline graphics. The rendering as a Mastodon Note is so highly distorted from the original as to be useless for viewing these posts and understanding the context from that platform. Turning it into an Article is the only workable solution we have.

nightpool · November 26, 2019, 11:25pm

I think using Article is a perfectly fine solution for when the desired semantics are “I think it’s better in a short-form microblogging context to display a link to this thing rather then the full content itself”.

When considering that some clients (especially native clients) often won’t support increasingly rich text formatting, I think using Article vs Note for content where you think plain-text fallback is unacceptable is a totally fine decision to make. (But you shouldn’t ignore the fact that it’s probably likely that people are going to write long-form clients that do support Articles but don’t support every possible type of rich text formatting, like the way RSS readers restrict allowed formatting for readability)

trwnh · November 27, 2019, 12:39am

I can’t remember where this was said to me, but I think someone (maybe nightpool?) suggested that Note could be used for message passing a la Discord, which would be a third distinction and perhaps better than length/newlines or plaintext (as explained previously why those are inconsistent). So in that case, we could say Note vs. Article is one of formality. IMO this could complement the Article-to-<article> suggestion above.

Taken together:

Note = for informal message-passing
Article = for publication or syndication

The only thing unresolved with this distinction would be the blurry line that is microblogging. Microblogs are often both informal and intended for republishing in feeds. It’d be pretty clear-cut to use Note for a messaging application and Article for a blogging platform, but… perhaps this implies top-level microblogging posts should be an Article and replies should be Note? Or that they should all be Article regardless of the fact that they are informal? There’s still room for discussion there.

thebaer · November 27, 2019, 11:48pm

I’m experimenting with support for publishing both Notes and Articles in WriteFreely right now… In thinking about where to draw the line between the two, it might help to play around with an implementation like this.

darius · November 28, 2019, 12:03am

I think this is right on the money. My main use case for Article is that it’s a way to say that “this is a publication” – I’ve often talked about how this might actually help solve the “quote-tweet” design problem in Mastodon (and presumably other software). If an Article is something that is formal and published then it’s also something that can reasonably be commented upon. Articles can be quoted-commented, and Notes cannot. So I can dunk on a New York Times article, but not on a random thing someone posted as a note.

Edited to add: for composing posts on a microblogging service, I wonder if a “people can quote-comment on this post” checkbox or whatnot could be, at its most basic, a switch that underneath it all changes type for your post from Note to Article.

trwnh · November 28, 2019, 5:44pm

quoting and commenting is kind of out-of-scope and it’s up to each platform to decide how they want to handle it, tbh. from a data view, it doesn’t really matter because you can put whatever you want in content, and inReplyTo is just metadata (like tag). You can have an Article inReplyTo another Article, just as you can have a Note inReplyTo another Note (or a Note inReplyTo an Article, or even vice-versa, etc etc.)

the thing is, though, that it’s really seeming like if we use “syndication” as the distinction, then that would imply a lot of things that are currently Note might be better conceived of as Article.

spider · November 29, 2019, 5:42am

I respectfully disagree. As long as there are platforms which aggressively filter HTML content, reducing it essentially to text/plain; we will require a workaround for content which is informal yet structured and rendered using HTML by design, even though Note is specified as text/html by default. Currently, the only mechanism available is to convert to Article. I’m not even discussing “troublesome” tags like iframe. Just simple lists, bold, italic, quotes, code blocks, and other inline media (rather than stripped out and placed at the end). Our project uses these constructs frequently in very informal conversations which otherwise perfectly fit the description of a conversational microblog. If this ability is taken away, I can easily see a split in the fediverse. The expressive fediverse which only uses Article and the text/plain fediverse which only uses Note.

kevinmccurley · December 24, 2022, 9:36am

I’m surprised to come across this as the last entry here in late 2022. Maybe I’m missing something, but the Object types defined in the ActivityPub vocabulary seem incredibly vague. The fact that Note can have content with HTML in it is in itself quite weird. Full HTML is a really bad idea unless the sender of the object can be fully trusted. mastodon seems to “sanitize” HTML, and in the process it may destroy the semantics of the content itself. mastodon doesn’t really have any choice, since HTML is a poor language to express untrusted documents in.
Putting the safety issue aside, the semantics of the different Object types are still quite unclear. If different implementations expect to have their produced content be understandable, these semantics need to be ironed out more completely. I’m surprised that the ActivityPub spec did not provide a mechanism for extension of these crude basic types.
It feels like this work was abandoned as “Schemas are hard. Let’s go blobbing”. Without better semantics on the types of messages passed between systems, there is no real hope of interoperability on the fediverse.

trwnh · December 24, 2022, 11:54am

Well, not much has changed in the past 3 years…

This is true, as most of them don’t actually carry any semantics with them. Most properties are applicable from Object anyway.

Well, the sanitizer exists to ensure that it isn’t “full” HTML, but only some usable subset. Mastodon’s issue is that it sanitizes too much – “safe” tags are removed because Mastodon isn’t concerned only with safety, but mostly with display. Mastodon generally allows inline tags that don’t alter formatting. I don’t think this is a problem with HTML itself – “some HTML” is better than “no HTML”, but also it would be equally feasibly to instead send out text/plain and expect implementations to recreate the links and formatting from the metadata like the tag array (Mention, Hashtag, and so on).

It’s better to think of types as interfaces rather than as classes. In ActivityStreams/Pub, what matters is which properties an Object has. The actual type is nothing more than a hint. For example, you would validate an actor not by whether type in [Person, Group, Organization, Application, Service], but rather by whether has inbox && has outbox. You wouldn’t identify an activity by checking it’s one of the activity types, but instead by checking for actor. This is precisely to allow for extensions.

Anyway, the “practical” way to decide in 2022 is, do you want to have your content viewed in full by microblogging implementations such as Mastodon? If so, use Note. Using Article may be more semantically correct, but Mastodon will not show Article inline – it will transform it into a status containing only name/summary and url/id.

If you find yourself not caring about Mastodon compatibility for whatever reason, then I suppose you can use whichever – as discussed above, the most suitable semantic distinction seems to be that Article is a formal publication (i.e. perhaps expected to be syndicated) whereas Note is not (i.e. just a blob of text).

snarfed · January 17, 2023, 11:23pm

As just another data point, the IndieWeb community went over this debate a handful of times too. We agree with most of the ideas here in spirit, and in practice, we settled on a simple, explicit heuristic: articles have titles, notes don’t.

(In AS2, the obvious corollary is the name field.)

In the IndieWeb, this generally means that blog posts are articles, microblogs are notes. Concretely, we distinguish by parsing microformats2 out of HTML and applying the post type discovery algorithm, specifically the last few steps:

…

If the post has no “name” property or has a “name” property with an empty string value (or no value)
Then it is a note post.

Take the first non-empty value of the “name” property

Trim all leading/trailing whitespace

Collapse all sequences of internal whitespace to a single space (0x20) character each

Do the same with the content

If this processed “name” property value is NOT a prefix of the processed content,
Then it is an article post.

It is a note post.