Fediverse: one giant decentralized decentralized database

realaravinth · February 24, 2022, 6:58am

Greetings!

Activity Pub provides semantics to implement a gossip protocol. As long
as program is able to speak Activity Pub, it is able to access and interact with a
global decentralized database^[1]. To folks with limited
resources(such as I), this is very powerful. One can self-host all their
networked software needs at home with modest hardware while still being
able to communicate with the wider internet as and when needed, without
leaving the comforts of their instance. The alternative, with
non-federated software, would impose high demands and hardware
requirements as the host will have to accommodate everyone that they
wish to interact with.

Currently, to join the Fediverse, developers are required to read the
spec and manually implement Activity Pub for their program. A generic program
exposes an interface similar to a traditional database(like
Postgres)^[2] can facilitate easy onboarding. Such a database
will handle all the basic Activity Pub requirements like webfinger for discovery
and inbox and outbox for actors to facilitate interaction with foreign
actors and provide.

Generic third-party programs will also help with uniform behavior across
the Fediverse. Activity Pub will enter a different league where it is no
longer app-specific(Mastodon’s or Lemmy’s implementation) but will be
discussed in the same way SQL is: MySQL and Postgres flavours.

[1]: database is not be the correct term, as consistency guarantees are
out of Activity Pub’s scope but IMHO, small web is prone to failure as uptime is
traded for freedom. I’m not sure if any consistency measures can be
implemented for it beyond simple, periodic retries.

[2]: This is overly simplified. Such a program will also have to
implement pub/sub to facilitate app-specific effects for incoming
activates, etc. but the idea is to make Activity Pub available via a third-party
dependency, like a database.

weex · February 24, 2022, 7:17am

This is a neat idea and I’m not sure it’s every been brought up here so I’d like to explore it.

If the fediverse is a database, it seems to be sharded but in a long tail distribution where each instance has admins whose resources are on the line.

This localized provision of resources is kind of neat because it leaves the trade-off between human (admin effort) and technical resources (storage and bandwidth) to each instance. Instead of a tragedy of the commons as with many fully-anonymous systems we start here from more of a private property model.

I actually wonder what an MVP AP-enabled database host would look like from an API perspective.

realaravinth · February 24, 2022, 7:18am

Tangential but interesting:

Litestream(no affiliation), uses SQLite’s WAL to replicate(“continuously stream”) database onto block storage like AWS’ S3.

I imagine, Activities are similar to how WAL works. Mutations, before committing to DB should be described for with Activities, in the same way WAL commits mutation in logs before committing to DB.

realaravinth · February 24, 2022, 7:41am

MVP interfaces

These are some of the features that I think will make sense for the MVP DB^[1]:

1. Creating and mutating objects with visibility/privacy specifiers:

The database should create relevant activities and notify subscribers based on provided visibility configuration.

2. Pub/sub hooks to process events(incoming activities):

An app might wish to notify the user of when incoming activity arrives. Notifications(push, email, etc.) are out of scope of the database but such effects can be accommodated with pub/sub interfaces.

3. Decentralized fetching:

If a user is interested in seeing a foreign actor’s most recent toots, their instance will first have to resolve the foreign actor, find its outbox and then fetch its toots. I think, at least a part of this process can be implemented in the database where a query to a foreign actor’s outbox the actor is resolved.

4. Relationships and access control:

In order to support visibility/privacy restrictions, dependent programs must be able to describe relationships and various access control rules associated with them. And since relationships and access control rules are already prevalent across existing Activity Pub implementations, it would make sense to include it as part of the DB.

[1]: I’m intentionally being vague about the type of interface. The interface could be a HTTP API or a DSL that is more convenient for AP or it could start off as the former and evolve into the latter

aschrijver · February 24, 2022, 7:51am

Decentralized DB is one way to look at it, but also leaves some important aspects out of the equation. DB implies that it is about information storage that you can aggregate from many sources and use it for your own application. That’s true, but there need not be storage at remote locations there can be any kind of processing logic. That brings it more to a heterogenous service-oriented architecture.

But the most useful way to look at what ActivityPub provides is:

As @cwebber describes it to Sean Tilley (Medium article via Scribe.rip), at the start of their Spritely project:

Sean: “Why work on a new protocol?” [referring to ActivityPub]

Christine: "Well, as I mentioned before, I was very worried about a “fractured federation”, the fact that we had many federated social networks but they couldn’t talk to each other. I hoped that we could help alleviate that through a standards process.

As for why not OStatus, it’s worth noting that ActivityPub’s core design also comes from Evan Prodromou’s work on the Pump API, and Evan was largely responsible for OStatus. Email-like addressing (this better enables private communication, which OStatus didn’t really support), a clear but extensible vocabulary, and a closer conceptual connection to the actor model (is that bit too academic?) I think are all solid reasons for moving to ActivityPub.

One thing that I think is a bit underexplored currently is ActivityPub’s client to server API, which is very very similar to its server to server (federation) API. If you’re starting from scratch it’s likely easy to implement one and get the other at little extra cost. If we see more applications integrate this, one cool thing is that you could borrow any application’s frontend or mobile clients for another application’s backend. That could be really powerful."

[…]

Sean: “What features of ActivityPub are you most excited about?”

Christine: “I mean, I’m most excited about seeing interoperability actually happen. Aside from that, I’ll say that I think the extensibility model is quite good, but most nerdily I think that ActivityPub being an implementation of the actor model and mostly being self aware of that fact is good.” […]

“So the actor model approach matters. I did several experiments or previous revisions (XUDD, 8sync) that got me to where I understood enough to feel confident that this is the right route to go down.”

I am very interested in Actor model myself, though I encountered them first in the context of application architecture (i.e. Akka, when that project first started, and later Vert.x).

Note that years later the Client-to-Server aspects of ActivityPub still haven’t progressed much.

Also note that @cwebber with Spritely intends to move toward capabilities that allow implementation of a distributed multiplayer game. Besides that objective being a lot of fun, it serves to prove that AP++ (with the extra Spritely magic sprinkled in) can handle any kind of application type.

There’s much to explore, and many uses aren’t within reach yet. For instance Loïc of forgefriends found that for federating forges AP was best used as a notification mechanism, because of unreliable delivery of messages, not as a code forge state transfer mechanism:

[…] “Some think AP should be the only protocol used. How to maintain DB state? Then I figured AP is for notifications. You need 2 protocols, one to maintain state and the other to maintain notifications. And it doesn’t matter if these notifications get lost. AP offers no message delivery guarantee. Something for reliable state management is required. This is not how people talk about AP saying “But it conveys data”. What is you lose a message, say a Patch, do you have a reply mechanism. Can you determine the sequence of messages? You don’t have that in AP. You have a means to convey data, but no way to convey a consistent set of data. This is a difficult problem, but you don’t need to solve this problem. Some people may try, but when federating forges we do not need to solve that problem.”

See full text: https://forum.forgefriends.org/t/forgefriends-monthly-update-february-21st-2022-5pm-6pm-utc-1/629/6

(Bit of a cross-post with @realaravinth … I reacted to the first message in thread)

realaravinth · February 24, 2022, 8:18am

“Database” is a poor choice of word for to describe the program

And it’s interesting that you mentioned the actor model. Actor model allows for concurrency through horizontal scaling, as long as all state that an actor requires is contained within it. As ActivityPub is based on the actor model, a generic database-like program can work in both resource-constraint environments and also to power large instances via horizontal scaling.

Additionally, since actor data is mostly self-contained, the program can also offer seamless migrations for dependent programs between instances of dependent programs.^[1]

This is one aspect that needs work w.r.t to the program.

ThreadDB from textilie.io(not affiliated) provides a MongoDB-like interface on top of IPFS. IPFS protocol has provisions for consistency and p2p data transfer but doesn’t provide convenient options to store structured data(building blocks are present though). ThreadDB only concerns itself with schema and interface and builds on top of IPFS.

We’ll have come up with something similar, out of ActivityPub’s scope, if this idea is going to be perused.

weex · February 24, 2022, 4:20pm

This is really powerful and underappreciated about ActivityPub. Wherever there benefits of AP are being communicated, this should be higher on the list. If it’s already there, maybe some examples would be helpful that illustrate what this choice makes possible.

I’ve seen this pop up in many repos. Since many instance operators are non-technical, it’s hard to gather data on reliability. There’s also justified pushback on collecting the kind of analytics that would increase reliability. Might be a good place to propose a standard for reliability, a new guaranteed mode of delivery where it would only return feedback to the user of delivery once it had verified that contents had been stored on the other side. In Mastodon terms, seeing a checkmark on my toots as a sort of aggregated read receipt. Especially at the beginning it would probably help to solve a lot of low-hanging messaging loss issues.