How do I store activities and objects in an SQL database?

There are two ways of doing that I can think about:

  1. Have a table for each activity type and store them there.
  2. Have a one big activities table and store activities as JSON.

Both of those options seem to have some serious disadvantages, the former one makes it difficult to get all activities from one actor in the chronological order and is not very scalable, the latter one doesn’t allow me to know what actor created the activity and other data without parsing JSON first.

Is there a better way than using those two option? Or is there a way to “fix” disadvantages that I mentioned?

In Friendica we only store the processed data for a longer period. We store the raw JSON only for some days for caching purposes.

What period, what do you mean?

I guess we store the raw data for a month.

There are plenty of implementations that you could use as a template, but I guess the bigger question is, what do you plan to do with the data? You could certainly store the json and a selection of parsed fields (with indexes) to be able to retrieve the data you need.

2 Likes

I don’t store the activities themselves, and I treat ActivityPub like an API. I synthesize the objects and activities as needed from my data. This does mean that I can’t return the original JSON later should that be needed, but that’s fine for me since I don’t implement c2s and don’t plan to.

Here’s my database schema if you’re curious:

In my eyes an SQL Database is the wrong way.
But if you rally want to do it, you should store triples!
So your table(s) has the columns subject, predicate(property), object(value)

search for: Startpage Suchergebnisse

BTW: Friendica is “slightly” more complicated, due to the fact that we support multiple protocols: https://github.com/friendica/friendica/blob/a44a9e73c3b70523f38ec960b153673d96de425d/database.sql

Again, that depends on what you’re trying to achieve. In my opinion, it doesn’t matter much what happens inside your server, as long as the right bits come out of it for any given input. It’s much more convenient for me to treat ActivityPub like an API and let my presentation (UI) dictate the storage. Though I’m thinking about adding a separate table that would map ActivityPub object IDs to their types so I wouldn’t have to query several tables until I find the thing I’m looking for. That would allow retrieving any object in 2 queries and getting a definitive “not found” answer in one.

If you do want to store the original JSON, you could use a NoSQL database like MongoDB or ElasticSearch instead.

@protheory8 Do any of the above answers solve your question?

I did an experiment using a document database (CouchDB). The ease of storing a very flexible schema like ActivityPub, combined with high scalability made for a powerful combination. Otherwise you need to pre-determine all the crazy ways ActivityPub allows you to specify things (e.g., images) and convert every format to match your database schema when the message is received. If (when) you get a message that doesn’t conform, you have a real problem. By always working in the original JSON, you can eliminate that problem.