FEP-8b32: Object Integrity Proofs

Update: #700 - FEP-8b32: Update proposal - fediverse/fep - Codeberg.org

  • The recommendation of same-origin check was removed in favor of same-owner check.
  • Data integrity context was changed to v2 in examples and test vectors: https://w3id.org/security/data-integrity/v2.
  • Added “Privacy considerations” section discussing the possibility of exposing private data.
  • New implementation: Gush.
1 Like

I’ve implemented this in squidcity (along with its partner, fep-521a), and pushed it live a few weeks ago with no problems – tho that’s probably because very few servers have implemented it yet. But I had at least one successful verification! :slight_smile:

Some notes:

In my testing, base58 was awkward to implement and seemed to provide no meaningful benefit over base64 at these data sizes. I don’t want to pile on, but I agree with the others that an updated spec should recommend base64 and require both encodings to be understood.

I know there was debate about dropping HTTP signatures once this FEP becomes ubiquitous, but: HTTP signatures and proofs don’t completely overlap, so I don’t think HTTP signatures have become redundant. The ability of a proof to “stay attached” across boosts and quotes as it bounces around the network solves a real problem (yay!). But HTTP signatures, especially now that we have “server actors”, are asserting the identity of the server itself, and can be useful if you want a closed allow-list network. It acts as a kind of sealed envelope before you even get to the enclosed event.

Speaking of which, I noticed that the long-suffering HTTP signature RFC was finished. Is there a FEP describing how to use them (and especially: migrate to them)?

2 Likes

Your implementation of FEP-521a appears to be correct :+1: (I was able to fetch lobsters actor you mentioned in readme; can’t verify FEP-8b32 because my Follow activity is being rejected with FST_ERR_CTP_INVALID_MEDIA_TYPE).

The upstream specification needs to be changed first - base58 is required by eddsa-jcs-2022 cryptosuite (section 3.3.1).

I am in favor of base58, anyway, because it is easier to read and because it is already used in many places - integrity proofs, multikeys, did:key, etc.

Not yet, but there’s a detailed description in FEDERATION.md of tootik project: https://github.com/dimkr/tootik/blob/d6fecfefd80a445b27f589250bb19ebcd95acee2/FEDERATION.md#http-signatures

Thanks for checking it out! It looks like the follow requests were getting dropped by fastify because the content-type was “application/ld+json” instead of json or activity+json. I’ve just added that to the list of accepted content types.

1 Like

I am trying to verify the proof on this object: https://bots.grilledcheese.social/ap/post/17u8fu93k5tq51kg4g15. It has @context, but it doesn’t match the top-level @context.

eddsa-jcs-2022 has the following requirement:

If proofOptions.@context exists:
Check that the securedDocument.@context starts with all values contained in the proofOptions.@context in the same order. Otherwise, set verified to false and skip to the last step.

Your object has ["https://www.w3.org/ns/activitystreams", "https://w3id.org/security/v1"] and its proof has ["https://www.w3.org/ns/activitystreams", "https://w3id.org/security/data-integrity/v2"].

(my implementation doesn’t actually check the order, it simply compares contexts)

I see, so "https://w3id.org/security/data-integrity/v2" should replace "https://w3id.org/security/v1" in the standard context block at the top? We should probably call that out explicitly in the next revision of the FEP. (I think most servers write the boilerplate template once and then ignore it.)

Looking at https://bots.grilledcheese.social/ap/post/17u8fu93k5tq51kg4g15 I see no reason that https://w3id.org/security/v1 should be included at all. You’re not using any properties on the Note from that context. Every property on that document is from https://www.w3.org/ns/activitystreams except for proof, which comes from https://w3id.org/security/data-integrity/v2. Within the proof sub-document, you are not using any properties from https://www.w3.org/ns/activitystreams either. So I think the document should actually look like this:

{
  "@context": [
    "https://w3id.org/security/data-integrity/v2",
    "https://www.w3.org/ns/activitystreams"
  ],
  "proof": {
    "@context": "https://w3id.org/security/data-integrity/v2",
    "//": "the proof"
  },
  "//": "the rest of the document"
}

It can be simplified even further by taking out the @context on the proof, since this is all one document and you already imported it at the top-level (with no need for any overrides):

{
  "@context": [
    "https://w3id.org/security/data-integrity/v2",
    "https://www.w3.org/ns/activitystreams"
  ],
  "proof": {
    "//": "the proof"
  },
  "//": "the rest of the document"
}

The https://w3id.org/security/v1 context only matters when using terms from that context, such as publicKey, publicKeyPem, owner, signature, signatureValue, and so on. (A lot of these properties have been deprecated or removed from the Security Vocabulary, but are still used in current fedi implementations like Mastodon et al.)


Regarding the requirement from eddsa-jcs-2022, the order is important because later context declarations override earlier declarations.

For example, if the document declares ["https://w3id.org/security/data-integrity/v2", "https://www.w3.org/ns/activitystreams"] then it is possible that terms defined in https://w3id.org/security/data-integrity/v2 may be redefined in https://www.w3.org/ns/activitystreams, which would override the earlier term definitions and change the semantics of the proof. if the proof re-declares https://w3id.org/security/data-integrity/v2 then it ensures that the terms used within the proof are not semantically confused – any potential re-definitions within https://www.w3.org/ns/activitystreams would be overridden back to whatever is defined in https://w3id.org/security/data-integrity/v2.

A bit of a contrived example: suppose that the activitystreams context defined created to be a boolean of whether the current object was created or not. AS2 processors are required to inject the normative activitystreams context if it is missing. Doing so would cause the proof options created to map to whatever is defined in the activitystreams context (our hypothetical boolean here) instead of what is defined in the data-integrity/v2 context (dc:created, the timestamp of when the proof was created). Rather than requiring eddsa-jcs-2022 implementers to understand JSON-LD and re-inject the data-integrity/v2 context into the proof options, the algorithm simply asks eddsa-jcs-2022 verifiers to check that no terms were overridden like this. The order is important for ensuring this check. (In practice the data-integrity/v2 context is @protected, so terms defined there cannot be redefined/overridden later. JSON-LD processors will catch this and throw an error, but JSON processors will not.)

The @context should be copied to proof. This is stated more clearly in the proof generation algorithm:

If unsecuredDocument.@context is present, set proof.@context to unsecuredDocument.@context.

https://www.w3.org/TR/vc-di-eddsa/#create-proof-eddsa-jcs-2022

I know, I was trying to say that my implementation checks if contexts match exactly, instead of doing what the specification prescribes:

Check that the securedDocument.@context starts with all values contained in the proofOptions.@context in the same order.

Okay, so it probably came from publicKey which I still use in actor records for cavage-draft signature compatibility. (I use the same minimal context block for all AP records to avoid overhead.)

I’ve added the new data-integrity url to the top context, and now copy that into the proof. Hopefully that will satisfy the verifiers.

@robey Did you change the proof.@context? I saw an activity from your test actor a few days ago, and it still had wrong context.

Ope, thanks, I missed the most important code block! I think it’s fixed now: https://bots.grilledcheese.social/ap/post/17wtczg3emnjg6tap36b

1 Like

Update: https://codeberg.org/fediverse/fep/pulls/839/files

I added two new requirements, including the requirement of forward compatibility.

I also added a note about verification method rotation to “Privacy considerations” section, to address concerns raised by @tesaguri in FEP-ef61 thread:

I am looking at implementing this into Mastodon, and I’m unable to verify the test vectors in https://codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.feature although I’m able to verify these: https://www.w3.org/TR/vc-di-eddsa/#representation-eddsa-jcs-2022

Additionally, I’m not sure how one is expected to handle proofs of embedded objects: indeed, if you consider the whole document as JSON-LD, the JSON-LD API does not give you much to unambiguously access a precise attribute other than expanding, compacting, or framing the document, but all these operations may change the JSON representation and thus break the signature.

E.g., in https://codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md#signed-activity-with-embedded-signed-object, how would you “make sure” the object is indeed a <https://www.w3.org/ns/activitystreams#object> while keeping the object’s JSON representation intact? Off the top of my head, I see no way to do that safely.

1 Like

This might be due to the presence of floating numbers in fep-8b32.feature document. Some JCS libraries don’t handle those correctly. Which one do you use?

A similar concern was raised by @helge in another thread: Use cases of fep-8b32: Object Integrity Proofs - #10 by helge. Several solutions were proposed - embeddedObjects, re-defining object as @json, etc.

Later, I opened an issue in w3c/vc-di-eddsa bug tracker - eddsa-jcs-2022 and nested documents · Issue #81 · w3c/vc-di-eddsa · GitHub. They said it is fine to embed a signed object and I didn’t ask any more questions.

If I understand the question correctly, this problem also arises in client-to-server context. An object published by a client can be parsed differently by the originating and receiving servers due to JSON-LD/JSON differences. This could cause security issues, and I have lately come to the conclusion that JSON-LD and JSON cannot coexist in the same network.

This sounds a lot like JSON-LD @included:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://activity.example/",
  "actor": "https://actor.example/",
  "type": "Announce",
  "object": "https://object.example/",
  "@included": [
    {
      "id": "https://actor.example/",
      "name": "Someone",
      "type": "Person"
    },
    {
      "id": "https://object.example/",
      "type": "Article",
      "name": "Top 10 AS2 examples -- number 3 will surprise you!"
    }
  ]
}

If you ignore all the JSON-LD stuff, the “plain JSON” document is just this:

{
  "id": "https://activity.example/",
  "actor": "https://actor.example/",
  "type": "Announce",
  "object": "https://object.example/",
}

If you flatten the first example or convert it to N-Quads, the @included block goes away. The @included block is just a framing tool so that the “plain JSON” can match a specific schema (say one where actor and object are required to be JSON strings). It’s broadly similar to JSON:API’s concept of “included”, as JSON-LD 1.1 points out… except it keys off of @id only (instead of both type and id).

A canonicalization scheme could define that included blocks are stripped before hashing and signing, and that therefore the information included in an included block is not trustworthy on its own without its own signatures. I guess you could profile JCS if you wanted to depend on the JSON serialization, or profile RDFC if you wanted to depend on the N-Quads dataset (minus the included statements).

The way you “make sure” that object is specifically <https://www.w3.org/ns/activitystreams#object> is via one of the following mechanisms:

  • JSON-LD @context expansion. A JSON-LD processor loads the declared contexts (modulo other document loader safety concerns) and normalizes everything to its full IRI reference.
  • IANA media type signaled via HTTP Content-Type header is application/activity+json or equivalent. Per definition, that media type includes the semantics that the object key is defined as <https://www.w3.org/ns/activitystreams#object>. For JSON-LD compatibility, this is also achieved via “context injection”, i.e. when you encounter this specific IANA media type, you can convert it to an application/ld+json document by tacking on "https://www.w3.org/ns/activitystreams" as the end of the @context array in case it is missing.

The latter is the closed-world / centralized variant of the former. They clearly already co-exist in the same network. The issue arises when you use terms that aren’t defined by application/activity+json or its equivalent JSON-LD @context, but this isn’t JSON-LD’s fault – the issue’s real cause is that people can and will disagree on what terms mean, as with any other matters of language. You can’t assume everyone always agrees with everything you do or say 100%. The point of having keys and terms be expandable to IRI references is that http(s): IRIs conveniently include an authority component, so you can workaround the lack of authority in the data model and disambiguate two different definitions of the same term. Having a way to detect that there is a conflict does not create the conflict; the conflict is still there even if you don’t have a way of detecting it.

tl;dr,

depends on canonicalization. Embedding an object from one document into another document should be done with the recognition that documents and objects are not the same thing, and that embedded information about an object is necessarily scoped to the current document in which that information is presented.

For example:

GET /foo HTTP/1.1
Host: domain.example
Accept: application/activity+json
HTTP/1.1 200 OK
Content-Type: application/activity+json

{
  "id": "https://domain.example/foo",
  "type": "Activity",
  "object": {
    "id": "https://domain.example/bar",
    "type": "Object"
  }
}

Here, the statement that /bar is an Object is a statement being made within /foo. You can GET /bar and obtain different information (e.g. that /bar is a Tombstone). That doesn’t make the statement in /foo incorrect; it could be that statements in /foo and statements in /bar were made at different times.

When signing or verifying objects, it’s crucial to consider what you’re signing. As the author of /foo I can sign my own statement about /bar within /foo, and this is different than a signed statement about /bar from the author of /bar within /bar. If you don’t distinguish or qualify statements by their source[1], you will get confused.

I can’t get too much into this further, since anything else depends heavily on what your trust model is. But for the purposes of signatures, you have to look to the current document (and possibly infer additional information from HTTP headers).


  1. This is what quads are supposed to be in RDF – they contextualize triples by the graph they came from. But you don’t need to use quads, you just need to use some kind of contextualizing thing, like an HTTP resource; the statements in that resource can still be modeled as triples, while you reason about them as quads in order to explicitly consider who said something, or when they said it, or whether or how much you trust that statement. ↩︎

1 Like

@Claire I added intermediary outputs to fep-8b32.feature: canonicalized document, canonicalized proof config and the combined hash:

https://codeberg.org/fediverse/fep/src/commit/90444d8d5e880f2511e09efd8c142e4f0e13e4f6/fep/8b32/fep-8b32.feature#L31-L39

(PR: #871 - FEP-8b32: Better test vectors - fediverse/fep - Codeberg.org)

1 Like

That was it, thanks! We use the `json-canonicalization` gem that we already had an indirect dependency on.

The next non-patch version of Mastodon should support incoming top-level Object Integrity Proofs using `eddsa-jcs-2022`. We are also looking at supporting `mldsa44-jcs-2024` defined in Quantum-Resistant Cryptosuites v1.0 , it’s very similar, though the proof config handling of `@context` is different, and it mandates Base64 instead of Base58-btc. This is fine for us as our Multibase implementation supports deconding both anyway.

For embedded objects, I still have to read up on the recent suggestions, but I’m afraid this is going to be quite complex even though this would be valuable to us (for instance, to bundle quoted objects with a short-lived proof that they have been accepted, and avoid the costly and failure prone initial round-trip).

Talking about short-lived proof, is there any recommendation as to what to do with `created`, or support for `expires`?