Event Sourcing the ActivityPub Server

berkes · September 30, 2020, 9:55am

I’ve been investigating a, what I deem, very good match: Event Sourcing and ActivityPub.

Event Sourcing (Fowler has a more in-depth explanation) is mentioned on this forum only once and some DuckDuckGo research brings nothing forward WRT activilitypub and Event Sourcing.

The reason why I think it works well, is because it would allow a generic Inbox/Outbox “framework” that stores all the items placed in the inbox or the outbox in an event-queue and event storage.
From thereon, implementors would build Projectors, Reactors (or whatever you call the "things that listen to incoming events and process them) based on their domain.

There would be some internal events, processors and projectors, to store your in- and outbox in a way that the server can serve them to clients and to manage the delivery, the federation.

From there, the implemnter would be free: A NotifyChatGroup reactor would notify your favorite chatgroup. A PublishComment would publish an incoming comment on your blog, and a UserTimelineProjector would store the incoming statuses in a database where the user/app/api can read the timeline from and so on. This is why I think such a basis would be extremely useful for developers.

There are “event sourcing” framworks (often labeled CQRS, ES etc) for almost all languages.

I can see a lot of pro’s, but also some cons:
Privacy could be a concern: when your server keeps a log of all your “activitypub-activity”, rather than “just the current state” as with most MVC implementations (like mastodon), you are building up valuable data.
GDPR complience is a common challenge in ES setups: ES dictates “no events are ever modified or deleted”, yet GDPR dictates you delete all data for a user when requested.

So, did anyone investigate AP and ES combinations already? Is there an eventsourced implementation or PoC of AP out there, that I missed? Do you think this might work at all?

pukkamustard · September 30, 2020, 10:24am

I completely agree. I think ActivityPub can be seen as Event Sourcing system with Activities as Events.

We are experimenting with the idea in the openEngiadina project. For example a users Inbox is not a relation but a query on all Activities: Files · develop · openEngiadina / cpub · GitLab

Again, I agree. The system becomes an append-only log of Activities. We are working towards fixing that as well by allowing “garbage-collection”. More on that soon…

I highly recommend the talk in the ActivityPub Conference by @cjd, which I feel is related:

pukkamustard · September 30, 2020, 10:26am

Did you stumble upon this: https://dustycloud.org/misc/2019-05-03-convo-with-tmarble.txt

A nice exchange between @tmarble and @cwebber about Event Sourcing/CQRS and OCAP, which I found very enlightening (and incredibly foresightful of @cwebber to preserve!).

cjd · September 30, 2020, 10:41am

I do most everything using this event sourcing model, because I keep messing up state and being able to rebuild is quite nice.
I’ll comment that for the purposes of privacy, even without the perfect database/language of the future, there’s a quick&dirty method which would solve a lot of problems: Attach an expiration date to every piece of content in activitypub, plus a URL which can be hit to check if the content creator / data subject continues to consent to that content existing (in which case the URL contains a new expiration date). This changes deletion from being an “act” to being an “omission”, if a server goes down then the default is for all of the data to go away rather than sitting forever.

berkes · September 30, 2020, 11:02am

First: thanks for the reply and the pointer to @cjd’s talk. Watching it now.

Out of curiosity: would it not make sense to store the inbox events, but then use a Projector to store the “inbox” in an easily retrievable storage? The most simple implemenation, for example, would be a per-actor json file actors/1337/inbox.json, which is the properly formed OrderedCollection JSON for that actor’s inbox already. I probably miss some context though, and my elixir is poor enough that I cannot extract that context from the code.

pukkamustard · October 1, 2020, 6:45am

I only have limited knowledge of Event-Sourcing/CQRS lingo - I don’t exactly know what a Projector is.

But I think that is pretty much what happens - at different times. The “Projector” is the query that transforms the events into an ActivityStreams collection?

Instead of doing the transformation to the actors/1337/inbox.json file when activities are received, this file is generated when it is queried.

aschrijver · October 1, 2020, 8:40am

The common name is ‘projection’ and I guess a Projector is what creates them (e.g. for a specific domain aggregate). You will get a valid instance of the aggregate by hydrating it with all the Events that occurred either from the time of its creation, or - e.g. in case the number of events is too large for this - going from a Snapshot of the aggregate’s state at a certain moment in time, and applying Events from there.

Didn’t give this much thought, but there’s no 1-to-1 mapping conceptually to the event sourcing paradigm with regards to using ActivityStreams Objects + Activities as the events themselves. In ES an event only needs to contain the state changes, and some metadata (like an aggregateId). Many things in AP sent over the wire contain much more than just this state, and may contain nested objects/activities.

The match is still a very good one. I would go from a CQRS/ES architecture where incoming messages on the C2S/S2S api’s trigger commands that are executed, e.g. FollowPerson, FavoriteToot. A successful follow request then triggers a PersonFollowed event on the Person actor (an aggregate root in DDD terminology).

But I think @berkes has given this some thought already when mentioning ‘internal events’. There needs to be some translation from e.g. an incoming ‘Like’ activity to a ‘Liked’ event applied to a specific ‘Note’ or ‘Actor’ or whatever aggregate root, which is subsequently persisted in the event store. One incoming AP message may trigger multiple events. Besides executing separate commands, the events may trigger sagas (workflows) that invoke other commands in turn.

Note that CQRS and ES are different concepts and can be implemented independently of each other. CQRS means separating the ‘reads’ from the writes’, typically by having Command classes (writes) and Query classes (reads). Without Event Sourcing executing a command might lead to persisting data in a normalized relational DB model, and - when querying - consulting denormalized views that are optimized for quickly loading specific UI layouts. But that last bit is not required either.

With ES in the mix, you could still do with just one DB. With CQRS/ES you get things like: CreateUserCommand (a use case / feature) ➜ UserCreatedEvent. When fully separating write-side and read-side and having 2 databases things get most interesting, but also most complex. When storing events in a single table, or a specialized eventstore, you can now reproduce the state of the system in any moment of time, do time-travel, etc. Plus no data gets deleted, whereas in a relational CRUD system with every update you lose history.

Though it has a lot of advantages, the cons - apart from deletion being harder - are also added complexity due to eventual consistency, where the state of the read side lags behind that of the write side. It can be harder to trace what is happening in your system e.g. when this creates timing-related issues.

Btw, am no expert either, but very interested in a DDD / CQRS and maybe ES and maybe Actor Model (yes, all the buzzwords ) based architecture for a fediverse application. Note that for testing behavior-driven design (BDD) is very well suited, and you can have executable tests based on plaintext feature descriptions (very nice to get non-technical people in the loop).

I am looking to implement with @cjs go-fed due to the solid AS/AP foundation it delivers, and how it supports modularity and extendability where AP extensions are defined in a JSON-LD-formatted OWL2-subset vocabulary definition. (I am no Go programmer yet, so that’s a challenge).

Finally, what’s also interesting if you go DDD/CQRS/ES is to apply Clean Architecture in your project structure, i.e. browsing the code repo should immediately make clear which file contains what. It communicates the architecture. I just finished a follow-up to a discussion I have about this. See Clean architecture folder structure on github.

@berkes what kind of app do you have in mind? And what language / frameworks do you want to use?

berkes · October 2, 2020, 9:20am

Thanks for the elaborate reply!

Indeed, and good that you explicitely bring this up. Two things are impoortant, IMHO:

AP is not event-sourced (nor is it MVC or Reactive or anything), so an important part of such software would be to translate from and to ActivityPub-isms to Events: e.g. POSTing an activity in your outbox, would result in an “internal” ActivityAddedToOutbox. And a federated POST into your Inbox would result in a ActivityAddedToInbox. Those are not in your case or implementations’ domain; they are merely a mechanism to hook AP up to an event-sourced system.
From there, the Domain should use Domain language and not ActivityPub-isms. So, if, in e.g. your “actity-pub-enabled office collaboration suite” someone shares a document, a DocumentShared-event is emitted. One of the handlers of that event would then wrap the data and emit that as an ActivityAddedToOutbox. Reversed, a notification_service would probably listen to incoming ActivityAddedToInbox and emit PushNotificationSent or NotificationEmailSent event; or DocumentThumbnailCreated; whatever your domain needs.

@berkes what kind of app do you have in mind? And what language / frameworks do you want to use?

I’m building https://flockingbird.social. Also discussed on this forum. When I say “building”: I’m not yet writing code (unfortunately) but exploring, interviewing, wireframing and whatnot.

WRT languages and frameworks, I’ve narrowed it down to either Ruby or Rust. Because I am fluent in Ruby and because that allows me to steal/use a lot from Mastodon. Rust, because I know that too, and it would make the core (and possibly the entire software) a lot easier to distribute and run. Dropping it on a Raspberry-pi is nearly impossible with a ruby-suite (even more so if, like mastodon, you need sidekiq, redis, postgres, nodejs, elasticsearch and whatnot to run it). But dropping a binary on your Pi, and then running it, is perfectly doable. I do lean towards Ruby, with event_sourcery (framework I’m familiar with) or Sequent (framework I’ve not yet used) because of my familiarity, for the the MVC and PoC.

Go (go-fed) and .net (kroeg) is, unfortunately, no option for me, I can read and hack me some go, but not architecture a full product in it. Same with .net.

aschrijver · October 2, 2020, 9:28am

Yes, can imagine that’s a challenge, as it will be for me too. @cjs plans to work further on apcore which has all the basics in place for an AP server, and I intend to (maybe) build from this and first create a ‘Groundwork’ project that allows for pluggable modules (DDD, CQRS, maybe ES). I’ve described a bit more about this in my comment to Go-Fed: Past, Present, and Future - #2 by aschrijver.

aschrijver · May 17, 2021, 10:07pm

For anyone interested in the topics addressed in this thread, you should take a look at the Proof of Concept that Flockingbird has been building, which contains aspects of a DDD/CQRS/ES architecture:

Regarding Behavior Driven Design (BDD) the folks at TrustBloc ORB have some nice examples in their ActivityPub app. Here and excerpt of a BDD test for ActivityPub itself:

  Scenario: Get service public key
    When an HTTP GET is sent to "https://orb.domain1.com/services/orb/keys/main-key"
    Then the JSON path "id" of the response equals "https://orb.domain1.com/services/orb/keys/main-key"
    Then the JSON path "owner" of the response equals "https://orb.domain1.com/services/orb"
    Then the JSON path "publicKeyPem" of the response is not empty

    When an HTTP GET is sent to "https://orb.domain2.com/services/orb/keys/main-key"
    Then the JSON path "id" of the response equals "https://orb.domain2.com/services/orb/keys/main-key"
    Then the JSON path "owner" of the response equals "https://orb.domain2.com/services/orb"
    Then the JSON path "publicKeyPem" of the response is not empty

For the record, these tests are part of the codebase, and are directly executed. And still readable to non-technical users (though AP federation in this case is not the best example for that).