Content-addressing and signatures

pukkamustard · June 11, 2020, 11:18am

As part of the openEngiadina we have been doing some research into data model and data storage and are happy to announce initial results and a demo.

There are three main parts:

How to make RDF Content-addressable: Basically two tricks: How to group RDF statements and how the grouping can be encoded in a canonical form.
How to sign content addressed RDF: Once RDF is content-addressed it can be signed by simply signing it’s identifer (which is the hash). This introduces a small vocabulary for doing this based on the OpenBSD signify tool.
A secure way of doing content-addressing (ERIS): Naive content-addressing (just using the hash of the content) has some downsides. We present a scheme that was very much influenced by Datashards on how to securely store immutable content.

All together we hope this might serve as robust and implementable foundations for offline-first and decentralized applications and to maybe pave a way towards decentralized ActivityPub.

We have a demo that runs in the browser to show how this all works. The demo is capable of encoding any ActivityPub content (any JSON-LD) and I invite you to try it out.

The demo also shows how all this works for content-addressed vocabularies

There is also a JavaScript implementation that can be used (as well as a Guile implementation).

Relation to Datashards

ERIS (the scheme for content-addressing) is very much influenced by immutable Datashards.

The research started as an attempt to reimplement Datashards but grew out to explore some other ideas. The main differences are:

Different crypto primitives (BLAKE2b + ChaCha20 instead of SHA256 + AES)
Blocks are combined in a tree (instead of a chain)
Adds a verification capability (allows content to be cached without being able to read content)
No mechanism for mutable content

Idea is to converge in the future, discussion and work towards that has already started.

Does this make signing JSON-LD easier?

Eeh, kind of…

A design goal is implementability. However it starts at the RDF level.

Once you have your content as RDF triples the implementations is fairly straightforward (and is optimized to be so). However (and unfortunately) when content is encoded as JSON-LD you still need to go trough the expansion madness. The demo uses the JavaScript JSON-LD library to do this.

Compared to Linked Data Proofs (previously LD-Signatures) I believe what we propose is simpler and implementable (also less general and more opinionated).

Next steps

We intend to implement this in an Elixir ActivityPub server and get some hands-on experience.

I’d be very happy for feedback, comments and questions. If anybody is interested in experimenting and implementing I would be thrilled.

how · June 11, 2020, 12:09pm

Great work @pukkamustard!

I’m curious about your design choices, especially to understand why the differences with Datashards, especially:

Also the discussion about verification capability is super interesting! Thank you!

pukkamustard · June 11, 2020, 12:35pm

I think there are two ways of combining blocks together:

with a tree (a Merkle tree) - as in ERIS
and with a list (a Hash list) - as in the original Datashards write-up

The (IMHO) biggest advantage of a tree is that it allows random access to the content - you can efficiently decode a subtree to access a specific part of the content. Which part of the tree needs to be decoded is also known as the structure of the tree is very regular.

With a list you would have to iterate trough the list to access a certain block.

I believe the reason for using a hash list in Datashards was to “start simple”. I personally think that a tree is almost easier to implement…that’s probably why I went for the tree.

cjs · June 11, 2020, 8:28pm

Nice work!

Does each subtree do the same chunking of Datashards, where one decryptable subtree could be spread amongst multiple content-chunks, or does one decodable subtree equal one fetchable content chunk?

pukkamustard · June 12, 2020, 5:23am

Thank you!

Interesting question. Let me rephrase it slighlty: If you encode content A and content AB (concatenation of A and B) individually. Is the tree that encodes A contained in the tree that encodes AB?

The answer to that is no. The tree is encrypted with a key (the verification key) that is derived from the entire content that is to be encoded. So the tree that encodes A is different to the subtree of AB that encodes A, as it is encrypted with a different key.

Another scenario. If you encode the content AA would the same subtree appear twice? Again, no. As one single key (the verification key) is used to encrypt all internal nodes of the tree, a unique nonce must be used for every node (if not the scheme would be open to reused key attacks). ERIS uses a nonce generated from the position of the node in the tree (Nonce from position). The two subtrees that both encode A are different as they are encrypted with different nonces (same key though).

Does that answer your question?