Searching the fediverse

Hi all,

I’m interested in participating in the Fediverse and I’m wondering how people are thinking about search at the moment.

I’ve look at the docs for Mastodon, Lemmy, and Pixelfed and nobody seems to describe how search works for these federated systems! Does anyone know how these platforms or other implement search, and if there is any support to search beyond the local instance?

Is anyone working on a protocol for federated search? It seems like there are a bunch of possibilities beyond ‘giant central index’ which could be really interesting. For example, if thee is a common protocol for search, a server could forward requests on to some other servers. Or there could be a search platform which instead of indexing, it just fires off a request the the search endpoints of a bunch of servers.

What’s the current thinking around search on/in the Fediverse?

Thanks :slight_smile:

1 Like

Nothing exists that I know of. But it sounds like an interesting idea. You should build it!

For text search, we just do a postgres query with ILIKE %query% on the local database. There is no federated search, except that you can fetch remote objects by searching something like

Thanks for your replies.

I’m wondering if all this needs is a standardised GET /search endpoint, which could accept either free text or some Boolean expression. Supporting instances would then be responsible for determining which local objects match the query and returning the collection of results.

Since the Activity vocabulary is specified as JSON, it is reasonable to me to want to support some kind of ‘JSON query language’, so that someone could for example, search for Question type activities with particular options, or only oneOf Questions. But I’m not sure if there is a standard for this or what the popular solutions are.

With free text search, it would be entirely up to the instance to decide how that was implemented, what fields are searched, etc. So that might be a really easy way to get an MVP spec off the ground.

1 Like

I rememeber @schmittlauch was working on federated tag search as part of university assignment. Any news on this?

Unfortunately not. After the theoretical work on an architecture I had started implementing a prototype, in the meantime evolving on the DHT used and specifying a preliminary protocol format. See Hash2Pub.

Unfortunately, that work never got to a usable state. While I do still plan to resume that work if necessary at some point, it does not have highest priority so far to me. So feel free to ask me on details if you’re working on this yourself.

Regarding full text search, I’m sure that an architecture for that would make significantly different design decisions. There is work on combining multiple queries already on their path back, but these always looked challenging from a security point of view (risk of censorship or faked replies).

1 Like

Thank you for the update @schmittlauch! I’m sorry you could not complete that software.

Maybe then this is something to discuss with the people from searx.

Oh, hey, great to hear from you! I really enjoyed your presentation at APConf, in 2019, on this!

1 Like

I actually did some hacking on searx a while ago for a non-Fediverse purpose, but I think the same approach could be taken here. If searx had an engine or class for each type of Fediverse site (Mastodon/Pixelfed/…) then it would be possible to make a searx instance which could directly query a list of different sites and pool all the results together.

1 Like

Would it not be easier to have a single searx engine for ActivityPub-compliant servers and work implementation differences from there?

1 Like

My thinking wrt search has changed over time. My thinking as new fedizen was “Yeah, federated search so you can find everything on the fedi”. No longer of that opinion.

Personal social networks

I’ve come to love an aspect of the fediverse that wasn’t immediately obvious to me: that on the fedi I was involved building my own “personal social network” by the social graph of followers I gradually handcrafted. And that for me the fact that this social graph is relatively small, means that it is ‘human scale’, manageable.

Why would I need to search in the microblogging behaviour of millions of fedizens? Mass Social Media is done by the walled garden platforms. I am doing social networking, which is what humans have been doing since the dawn of time and are now extending online.

So, I am happy and content that search on my instance is ‘restricted’ to only the social interactions I have had with others. The people I chose to engage with in the past.

I do not have FOMO to miss some discussion somewhere, and if I feel my social graph doesn’t cover a personal interest of mine, then I should put in the effort to improve my graph.

Personal knowledge networks

If you are working in a particular field or have certain interests, there is a need to delve in the broad body of knowledge that exists. I distinguish the knowledge network from the personal social network. The question is: should Microblogging be indexed for knowledge gathering? I personally think not.

For knowledge gathering I have the existing Web, and the search engines operating there. Now, beyond Microblogging there are other types of federated apps and there’s more to come. Blogs, wiki’s, open science tools, etc.

These I would like to be indexed and searchable. But that need not be part of my Microblogging apps and clients.

For me Info Overload is an everyday thing already. Adding full text search to entire fedi doesn’t solve it. It only increases it.

1 Like

@how Yes I agree, I would like to develop a specification for ActivityPub search which can be consistent across implementations / site types. But failing that I think per-type engines could fill in the gap for now?

@aschrijver Thanks for sharing your thinking and experiences. I agree that curating our networks and mitigating information overload are positive process and outcomes and I like how Fediverse can enable these. My angle on this comes mostly from a discovery stage: how do I find out who to connect with, based on my interests, if I can’t find those people easily at first? For example, I am big into jazz-funk, but this is not the most popular genre of music. If I could search timelines or profiles for hashtags or relevant keywords (“who’s been posting about Incognito?”) then that would help me to discover new people and grow my social graph.

1 Like

I fear going for per-type engine would send the wrong message and end up bringing more work for you instead of encouraging ActivityPub implementors to focus on compliance. If instead you focus on a generic search engine for specification compliant entries, it will work out of the box for any new compliant implementation instead of just one or two major implementations. It may not be as useful from the start because it would limit search to a few general cases, but then, as more people work they software towards compliance, results would fall into place.

One challenge to search is the data storage needed for the index. The more you index, the more storage you need. And one challenge to federated search, in particular, is that you can’t do it in real time en masse because that would require querying hundreds or thousands of federated servers all at the same time. And querying thousands of servers at the same time would require some serious resources, both on the server making the query, but also on the servers answering the query.

So you would have situations where small community servers and instances don’t have enough resources to realistically index a large enough portion of the fediverse for their own members, and would crash if too many other servers started querying them at once.

Unless we can overcome these issues, then most likely search will be performed by intermediaries who can afford to index the thousands of fediverse servers that want to be indexed.