FEP-9091: Export Actor Service Endpoint

codenamedmitri · July 11, 2024, 7:34pm

This is a discussion thread for the proposed FEP-9091: Export Actor Service Endpoint.
Please use this thread to discuss the proposed FEP and any potential problems
or improvements that can be addressed.

Summary

This FEP defines an API endpoint used to initiate the “Export Actor” operation.
The output and semantics of the result of the export operation is out of scope, and left
to subsequent FEPs.
The endpoint only specifies how to start the operation, and by extension, how to tell if
a given Actor’s server supports this operation.

stevebate · July 13, 2024, 8:26am

What’s the rationale for the empty POST instead of a GET?

I’m wondering if some servers will want to perform this operation asynchronously. For example, it’s not an API, but Mastodon is async for account exports and the user must return to the UI later to get the results. Some API clients may prefer a webhook or an endpoint to poll for generated export artifact(s).

codenamedmitri · July 13, 2024, 6:18pm

That’s exactly it, yeah – it’s a nod towards async. I had a long argument with @bengo about whether it should be a POST or a GET (he was on the GET side). What I think ended up tipping it in favor of a POST, is that it’s often going to be an expensive operation – the server has to go retrieve all that data and attachments, and bundle them into a .tar ball.

So because of that, because we’re trying to denote that it’s an expensive async operation, the FEP started out as having TWO different response modes – a “sync” and an “async” mode.
The sync mode is exactly what it is now – you POST a request, you get a direct response with the assembled .tar ball.
The async mode used the HTTP 202 Accepted mechanism, where the server acknowledges that it kicked off the batch job of bundling the tarball, and gives the client a location to poll (until it’s completed).

But as I was sitting there writing the async mode part, I was like, you know what, this is too complicated, no implementer is going to want to implement this batch/async/polling mechanism, client side or server side. After all, this is why open http connections and chunked file headers exist, in the first place. So the async mode was dropped.

But I still think it should be a POST operation instead of a GET, even if the tarball is returned directly. And part of it is - pre-fetching. Some browsers and extensions pre-fetch all regular GET links on a page (for performance boosting reasons). And we don’t want to encourage the kicking off of an export process accidentally (while somebody is browsing their Settings or Profile backend page).

stevebate · July 13, 2024, 7:06pm

I agree with @bengo on this one. Using a POST for a synchronous get operation seems to violate HTTP semantics. My understanding is that HTML5 has hints to tell browsers which urls are beneficial to prefetch.

For async operation, I can see an interpretation where an implicit request is being POSTed and polling operations are querying the status of that request. After the async export creation is finished, the client could GET the artifact given a URL provided in the completed request.

On the server side, I’d guess the async operation is going to be the common implementation since it may be expensive to gather account status data and build the export artifact.

codenamedmitri · July 17, 2024, 12:36am

Using a POST for a synchronous get operation seems to violate HTTP semantics.

See, I think I disagree with that assessment. All POSTs are usually synchronous, but non-idempotent.

stevebate · July 17, 2024, 4:16pm

The synchronous aspect wasn’t the primary point. My thinking is that if the operation is retrieving a resource (versus posting a resource), GET seems to be the correct HTTP verb.

codenamedmitri · July 19, 2024, 12:37am

And I totally agree with you – if it was simply retrieving an existing resource.
But the POST operation here is /creating/ a resource; it didn’t exist before the post.

stevebate · July 19, 2024, 4:59am

It’s an internal implementation decision whether the export resource is maintained continuously and incrementally (or periodically) and the GET returns a previously existing/cached export artifact rather than it being generated on demand. Logically, it’s a GET request from a client perspective either way.

It’s not posting the resource that’s being created. In other words, the POST is not posting the export artifact. As I said earlier, the POST could post/create an export request resource that could be polled for completion status. At completion it would contain a URI that can be used to GET of the export resource.

Note this supports async operation from a client perspective, Async behaviors (if any) in the export resource creation are a server implementation detail. It seems to me that the FEP is conflating those topics.

stevebate · August 15, 2024, 9:20am

This page describes an HTTP Asynchronous Request-Reply pattern: Asynchronous Request-Reply pattern - Azure Architecture Center | Microsoft Learn. The pattern they describe uses a POST with a 202 response to initiate the async operation. This is effectively creating a request for an asynchronous action to be performed. The POST response provides a URL endpoint to GET the status and, eventually, results of the async request.

codenamedmitri · August 17, 2024, 7:39pm

Hehe, GMTA – that page was exactly what we started off using, in the initial iteration of the FEP. But decided that it was way too complicated for implementers, for not enough benefit. So we scrapped it, and went to a simpler single POST.