Report errors in server processing

Consider the following: Two ActivityPub Servers Klein and Moebius are trying to federate, but some things don’t work as expected. Currently, the debugging process requires members from both development teams to communicate. As only the Moebius’ developers can tell why the ActivityPub object from Klein cannot be processed. This can often be achieved by just looking at the server logs, as the Moebius developers have implemented fantastic logging.

In order to reduce their workload, the Moebius developers now propose the following:

  • When submitting an ActivityPub object via POST, the Klein developers should include the header
POST: https://Moebius.band/inbox
X-Report-ActivityPub-Processing-Error: https://Klein.bottle/errorEndpoint
  • If processing an object with this header results in an error, a report is send to this endpoint.

This means that the Klein developers can debug processing errors on Moebius side (due to bad input) without help of the Moebius developers.


Story time is over now, and some other considerations:

  • Is there any useful format for this type of error report? I could cook up something, but I would probably reinvent the wheel.
  • Has something like this been considered before?
  • This type of feature should probably be only enabled for certain users? e.g. the account corresponding to allyssa_p_hacker
3 Likes

I’m not aware of something like this, and I think there is good reasons not to implement something like this. Logs can contain sensitive information or internals that you probably shouldn’t share. If you really wanted to do this you should vet and review the reporting URLs somehow to ensure they are run by a trusted entity. Presumably the reporting URL would not necessarily be the other instance, if we are talking about developers of fediverse software, which are not (always) the same as server administrators.

Normally I’d say that the error reporting should be done by just returning an appropriate HTTP response. But an issue with that is that many implementations use queues to process incoming activities and just return status code 202. This suggestion could “fix” that issue.

If this were to be done I would expect just (english) text since it will have to be read by a human anyway, although it might get problematic if you are receiving a lof of the same kind of report. Though you could conceivably filter by the text containing a specific phrase or something.

you could draw a comparison to smtp where delivery errors might result in a receipt from the mailer daemon that your message was bounced.

the challenge is in knowing who sent the activity in the first place (and therefore to which inbox you should deliver your error). typically this is determined by having a valid http signature, so i guess http signatures would have to be processed synchronously and their errors returned as part of the response. this may already be the case?

after the activity is accepted for further processing, if that processing later fails, then you can send your error report to the inbox of the actor who owns the key that signed the message (Signature.keyId.owner.inbox in current usage). this error report’s payload is open to interpretation – you could use a Reject as proposed in Signaling side effects asynchronously by generalizing Accept/Reject

1 Like

typically this is determined by having a valid http signature, so i guess http signatures would have to be processed synchronously and their errors returned as part of the response. this may already be the case?

Since signatures may require to fetch the key of the signer, the signatures are verified asynchronously, i.e. when processing queue items, in Misskey and Foundkey.

It may be possible to verify signatures synchronously at least when we already have the key, although I’m not too fond of the idea because of potential duplicated code.

it may also be possible to just not provide errors in case of http signature failures, if async processing is considered to be a requirement. in case of a valid signature but invalid activity, you could still send the error report to the Signature.keyId.owner.inbox since you are certain that they sent it. in case of an invalid signature, you basically have the same case as currently – no feedback on the error.

If you use HTTP Signatures for authentication and authorization of HTTP Get requests, you need to process HTTP Signatures synchronously anyway. So this is an issue that needs to be solved, if one want to build an instance that is somewhat “secure”. This is not the thread to define what “secure” means.


I’m currently trying to write up a nlnet proposal that basically combines

The end result will hopefully be something that tells everybody what they implemented incorrectly.

1 Like

I’m wondering, is there a recommended way to represent rejection reason? Should we use the content property?

If you use HTTP Signatures for authentication and authorization of HTTP Get requests, you need to process HTTP Signatures synchronously anyway.

Some things like the syntax and clock skew are checked right away, and will return a 401 status if they fail the checks. There is no content to the response as of now though.

If that’s not what you were referring to I’m not sure why you think verifying signatures needs to be done synchronously? Everything used to verify the signature is saved along with the queue task. Except of course yet unknown public keys, since they are obviously not known yet and need to be fetched. And fetching them is an asynchronous operation.

Key word in my statement is GET.

I think the behavior, I describe, is enabled in Mastodon by setting AUTHORIZED_FETCH=true.

I see, please excuse my confusion since your initial post was talking specifically about “submitting an ActivityPub object via POST”, so I’m not sure what GET requests have to do with this.

Foundkey does not currently have an equivalent of requiring authorized fetch on incoming requests, though it is capable of producing respective outgoing requests.

After a little consideration I think it is a separate issue: If you want to use HTTP signatures for GET requests, i.e. only allow specific actors to request a resource (or not request it in case of a whitelist), you know the actor and their public key in advance of the request. This means that there will be no need to fetch the actor’s public key, thus it would not be an asynchronous operation.

FYI in Lemmy HTTP signatures are directly verified in the inbox HTTP handler, so that errors can be returned immediately. This includes fetching the actor if necessary.

The Moebius server may also simply publish the error report via a well-known endpoint.

For example:

POST: https://moebius.band/inbox
X-ActivityPub-Request-Id: 123xyz

GET: https://moebius.band/.well-known/activitypub
{"errorReports": "https://moebius.band/errors"}

GET: https://moebius.band/errors?request_id=123xyz
{"errorMessage": "missing property 'attributedTo'"}

https://www.rfc-editor.org/rfc/rfc7807

I like the header idea, and it has some advantages (can be processed without ANY dependency on the JSON payload / etc) but it also has some downsides (how can you determine whether the sender is authorized to send errors to a given endpoint?) and I would recommend to keep error reports that reflect business logic (required properties, permission errors, etc) more closely to the domain of ActivityPub and use a Reject activity instead, as @trwnh proposed.

The current norm among implementors (regardless of whether they already have the key or not) is to process HTTP signatures synchronously and AP payloads asynchronously. I think this provides a clean and natural separation for which type of errors are reported using which framework (signature errors with status code and immediate message, which helps developers implementing it for the first time, and AP payload errors using Reject, because they’re more likely to be usage-based errors that should be reported to a user, like insufficient permissions to reply to a post, rather then developer compatibility errors)

Alternatively I could see trying to draw some sort of separation between errors that should be reported to a user (using Reject) and errors that should be reported to a developer (using a header), but that sounds like a very difficult distinction for an implementor to maintain and, anyway, users should have visibility into when they’re unable to communicate with a given remote server because of an incompatibility so that they can raise the issue with their software’s developers.

1 Like

Following the discussion in fediverse-ideas issue #55, I’d like to further expand on the idea expressed in my previous post:

POST: https://moebius.band/inbox
X-ActivityPub-Request-Id: 123xyz

The well-known endpoint is not actually needed. Instead, if activity is processed asynchronously, the recepient should return a response with status 202 Accepted and X-ActivityPub-Response-Id header. The value of this header is the ID of a special Acknowledge activity (I think the difference between the standard Accept activity and developer reports is big enough to justify a new type):

X-ActivityPub-Response-Id: https://moebius.band/acks/123xyz

Here’s an example of Acknowledge activity:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://w3id.org/fep/xxxx/Acknowledge"
  ],
  "type": "Acknowledge",
  "actor": "https://moebius.band/actor",
  "id": "https://moebius.band/acks/123xyz",
  "content": "200 OK",
}

The value of content property is a HTTP status that would be returned if server processed activity synchronously.

Additional properties can be added if a more detailed report is required, for example result.

Edit: this reply is about Acknowledgments for Deletes, relating to Provide a mechanism for acknowledgements of Delete activities · Issue #406 · w3c/activitypub · GitHub )

So importantly here, we’re not just acknowledging that we’ve received the Delete activity, but also that we’ve done our best to respect that and delete the related data from our systems.

e.g., if software was to randomly spot check and discover that you said you deleted something when you really didn’t (some how), then I’d say that’d be a privacy policy violation and potentially grounds for defederation if not rectified immediately.

3 Likes

This to me is the key point.

It is more than “I have received it,” it needs to encompass a way to provide context around what was done.

For things like deletes, that may be done on a periodic basis (a weekly delete cycle). It may not be done at all due to litigation holds (in which case it MUST NOT report that it was done, but absolutely SHOULD report that it has been received, and it MAY end up with a changed state later when the litigation hold is lifted).

For things like Flag, it could very well require manual review. There’s a strong air gap between “200 OK” and “I have taken meaningful action.” There needs to be flexibility in a way to communicate that as well.

Fundamentally you cannot with a distributed system tell the difference between “something has failed,” “something is not supported,” and “something is slow in happening but will happen,” but how you respond as a sending server may very well be different between these three scenarios.

Yes, 200 OK is a bare minimum, it should probably be put in summary or even name, and not in content (which may contain a more detailed textual description of what’s happening or happened).

Context can be provided by additional attributes: context, result, instrument and other interesting but under-utilized properties from ActivityStreams vocabulary.