FEP-proposal C2S: websocket endpoint

Hello all!

I’m not ready to write this up as a FEP yet, because I’ve not thought through everything. However, I think it is a good idea for something that is currently missing in ActivityPub.

Proposal

Basically, it boils down to defining a new endpoint of the actor called websocket. See here for other endpoints. This endpoint SHOULD only be visible when the actor object is queried by the actor. The actor is identified through the Client2Server authentication/authorization mechanism.

The websocket endpoint MUST provide a RFC-6455 compliant endpoint, with the following properties:

  1. An event of type inbox is send whenever a new item is added to the Actor’s inbox. This event contains the new inbox item as data.
  2. When an event of type outbox is received by the server, it is treated similarly to a POST to the outbox endpoint of the Actor. See 6.2 Create Activity and following.
  3. When an event of type proxy is received with data an object URI, the server replies with an event of type proxied with data the object being requested.

Edit 1:

  1. An event of type store that allows the client to store a blob on the server. The server returns the URL of the stored blob.
  2. An event of type fetch used to retrieve blobs. Both from the server and the wider internet.

Use Case

The use case is to avoid polling when using Client To Server.

Feedback?

1 Like

or SSE, or both: Server-Sent Events, WebSockets, and HTTP or WebSub

i can see the point of streaming inbox events over websocket, but what is the point of the other ones? it seems to be unnecessarily replicating standard activitypub flows for no discernible benefit. for example, i don’t see a reason to post to outbox via websocket instead of via a regular http call. the only thing you would ever need to poll is the inbox, no? and websockets are the equivalent of push notifications, right?

2 Likes

I personally only have ever worked with websockets. Is there any advantage to using a new technology?

Tolkien wrote a book with a phrase that said something like

One connection to rule them all

I think that pretty much describes, why I want to put activities in the outbox through the websocket.

On the technical side of things: Implementing putting things in the outbox through a websocket is probably cleaner. One can send the new id of the object and activity through the socket. And once the request is done being processed, i.e. send to all followers, one can send a “done sending”. This is not easily possible with HTTP.

1 Like

The delivery notification is interesting, but I agree with the other comments that posting to the outbox is already covered.

You may want some kind of incremental delivery status notification since delivery to some recipients of an activity may complete quickly and others may take days or never complete if the target server has permanently gone offline.

I think it makes more sense to use SSE or WebSockets to handle notifications for inbox items, because it adds a new and unique capability to ActivityPub that doesn’t exist elsewhere. I agree with the rest of the community that it doesn’t make sense to use websockets to reimplement features that are already required to be transmitted over HTTPS, like outbox POST events. This just doubles the amount of work required to implement an ActivityPub server for no clear benefit. Taking your “one connection to rule then all” approach would fragment the ecosystem—some clients would support only websockets, some would support only POST, and users would have no way of knowing which is which.

I’m curious why you think this isn’t possible with HTTP. it would consume exactly the same amount of resources as a websocket connection—you keep the POST request open until the state of the async job is settled, and then you return the response. Most web performance advice would tell you not to do this because it would tie up a connection slot, but that’s the exact same reason why websockets is so hard to scale anyway, so if you’re already committed to paying the cost of implementing WebSockets, there’s no reason not to wait asynchronously before resolving your POST request.

Otherwise, I’m not exactly sure of the details but you could also imagine offering this as an optional upgrade to the outbox POST request by looking at the Connection header and determining whether the client is offering to make a HTTP connection upgrade and then serving the “extended status info” only over websockets with incremental updates. I’m not sure how good browser support is for this usecase, but my understanding is that the underlying protocols easily support it.

I think I’m being sold on using Server Side Events. In addition to “one-way communication is enough”, I also want to mention that it is clearly specified. There are fields, I can assign a value to. Also it should be familiar to a lot of Fediverse developers as it is used in Mastodon.

My current idea on how to use the fields is:

  1. type is the type of object the data is appended to. Suggested types inbox, outbox, and meta (for status events on the server, e.g. 5 minute warning for server reboot)
  2. data contains the JSON string of the activity being added. For the inbox, this is the activity as added to the inbox.
  3. id specifies an id used for recovery of the connection. If possible this id should also be compatible with inbox fetches.

Ad 1. I’m still undecided if requiring anything except type: inbox is a good idea. I’m also most tempted to leave it up to the server and specify inboxStream to contain all new inbox elements, and then outboxStream and stream

Inbox endpoint compatibility

If the last event received over SSE was:

type: inbox
data: {"type": "Create", ...}
id: 1243243

it is possible to fetch the following entries from the inbox using inbox_url?min_id=1243243.

Implementation

Some stuff is still missing like adding the endpoint to the actor object and defining a context.

So now to the hard part write the FEP document…

1 Like

I’ve realized that a new endpoint for this is avoidable, if one instead uses

GET /collection; accept: text/event-stream

and performs content negotiation. So a GET with accept “application/activity+json” will give you the OrderedCollection stuff and a GET with accept “text/event-stream” will get you Server Sent Events.

The main advantage of this is, it works without introducing any awkwardness with additional headers if one wants updates to a non-inbox collection…

1 Like

@melvincarvalho wrt to your mention of websockets on the SWICG mailing list a heads-up to this topic :slight_smile:

I have a skeleton of the resulting fep: fep/fep-xxxx.md at b054e967b4d4ef08ba551008eaef881a3570c6aa - fep - Codeberg.org.

Also note bovine supports it: Event Source in BovineClient Tutorial.

I would like to mention a related discussion started by @stevebate on the SWICG mailing list, relating to @melvincarvalho microfed.org project and veering into Websocket topic:

Update: Oops, I mentioned the wrong person… @bobwyman sent the SWICG mail :smile:

Not sure what their goal is. I have now expanded my draft to hopefully make clear what my goal with the serverSentEvents endpoint is:

serverSentEvents is a new standardized endpoint for ActivityPub actors. It is meant to be used by ActivityPub Clients, i.e. software using the Client 2 Server protocol to communicate to an ActivityPub Server. It’s key advantage over web hooks discussed in the next section is that clients that cannot receive http requests, e.g. a web browser, can still be notified. Using this one can avoid polling with all its negative consequences.

Just to clarify, I’m not Bob Wyman. :wink: I can’t participate in the SWICG mailing list because of the W3C legal documents I would be required to sign (that require employer consent).

1 Like

It seems like this problem could be solved much easier by using HTTP long polling.

FWIW Server Sent Events are a specific form of long-polling with existing browser support

I would suggest that the easiest way to support moving between SSE and polls/collection fetches is to make the ID id of the object either

  • The ID of the item being returned, or
  • The URL of the next page (this one is kinda unsatisfying, I know, because that page would be empty at the time the page is sent)

If you go for the second option, you can in addition say that if the Last-Event-ID header is missing, to fall back to starting from the collection page indicated by the URL if one is present (This is useful because the DOM EventSource annoyingly doesn’t allow the user to set the Last-Event-ID internal variable)

First observation, id is a string in an Event as defined in the HTML Standard. This means I need to fix my implementation and submit a pull request to quart.

Second, I like the idea of using the ids of ActivityPub elements as the event ids, but I’m not sure of the consequences.

While ServerSentEvents may appear somewhat easier to implement than WebSockets, it seems to me that other differences between the two should also be considered:

  • WebSockets are bi-directlonal, Server-Sent-Events are uni-directional. While this may not seem to be a compelling difference, since most messages are from server-to-client, the use of WebSockets for client-to-server messages would reduce server resource requirements (i.e. connection setup/tear-down, authentication/authorization, etc.)
  • ServerSentEvents require that applications send “keep-alive” messages to maintain open connections. (See The HTTP living spec suggests that a keep-alive every 15 seconds might be needed…) On the other hand, WebSocket Ping-and-Pong frames are automatically generated by WebSockets as needed and “User agents must not use pings or unsolicited pongs to aid the server; it is assumed that servers will solicit pongs whenever appropriate for the server’s needs.
  • WebSockets can support many client-server connections, but ServerSentEvents are more limited since many browsers limit the number of concurrent HTTP connections to any one server. (See In some cases, the limit is a low as six.)
  • WebSockets supports both binary and UTF-8 messages, while ServerSentEvents only support UTF-8. An ability to do binary transfers might be useful for STORE events and others.
  • I am aware of many demonstrations that servers can support massive numbers of concurrent WebSockets with reasonable resource use, however, I haven’t seen much good discussion of ServerSentEvents scaling. If anyone has data on that, please provide links.
  • More browsers seem to support WebSockets than ServerSentEvents, but polyfill can be used with ServerSentEvents to backport support while polyfill doesn’t work with WebSockets
  • Unlike ServerSentEvents, WebSockets clients can detect dropped connections, but, unlike with ServerSentEvents, WebSockets won’t automatically do reconnection. Thus, the client must manually reconnect dropped connections.
  • Nostr relies on WebSockets. There are, I think, many very interesting opportunities to mix Nostr protocols with ActivityPub. Doing so would be easier for a client that already implements WebSockets.

So, there is good and bad with both. Given the differences, some may still prefer ServerSentEvents to WebSockets, however, it looks to me like WebSockets offers a potential for greater efficiency and also allows for more flexibility in the design of future extensions to the existing protocol. What am I missing?