Searching the fediverse

wcerfgba · November 13, 2022, 4:24pm

Hi all,

I’m interested in participating in the Fediverse and I’m wondering how people are thinking about search at the moment.

I’ve look at the docs for Mastodon, Lemmy, and Pixelfed and nobody seems to describe how search works for these federated systems! Does anyone know how these platforms or other implement search, and if there is any support to search beyond the local instance?

Is anyone working on a protocol for federated search? It seems like there are a bunch of possibilities beyond ‘giant central index’ which could be really interesting. For example, if thee is a common protocol for search, a server could forward requests on to some other servers. Or there could be a search platform which instead of indexing, it just fires off a request the the search endpoints of a bunch of servers.

What’s the current thinking around search on/in the Fediverse?

Thanks

mk3 · November 13, 2022, 6:33pm

Nothing exists that I know of. But it sounds like an interesting idea. You should build it!

nutomic · November 14, 2022, 11:46am

For text search, we just do a postgres query with ILIKE %query% on the local database. There is no federated search, except that you can fetch remote objects by searching something like @user@example.org.

wcerfgba · November 14, 2022, 4:43pm

Thanks for your replies.

I’m wondering if all this needs is a standardised GET /search endpoint, which could accept either free text or some Boolean expression. Supporting instances would then be responsible for determining which local objects match the query and returning the collection of results.

Since the Activity vocabulary is specified as JSON, it is reasonable to me to want to support some kind of ‘JSON query language’, so that someone could for example, search for Question type activities with particular options, or only oneOf Questions. But I’m not sure if there is a standard for this or what the popular solutions are.

With free text search, it would be entirely up to the instance to decide how that was implemented, what fields are searched, etc. So that might be a really easy way to get an MVP spec off the ground.

how · November 28, 2022, 3:03pm

I rememeber @schmittlauch was working on federated tag search as part of university assignment. Any news on this?

schmittlauch · November 28, 2022, 4:13pm

Unfortunately not. After the theoretical work on an architecture I had started implementing a prototype, in the meantime evolving on the DHT used and specifying a preliminary protocol format. See Hash2Pub.

Unfortunately, that work never got to a usable state. While I do still plan to resume that work if necessary at some point, it does not have highest priority so far to me. So feel free to ask me on details if you’re working on this yourself.

Regarding full text search, I’m sure that an architecture for that would make significantly different design decisions. There is work on combining multiple queries already on their path back, but these always looked challenging from a security point of view (risk of censorship or faked replies).

how · November 28, 2022, 4:44pm

Thank you for the update @schmittlauch! I’m sorry you could not complete that software.

Maybe then this is something to discuss with the people from searx.

codenamedmitri · November 28, 2022, 8:11pm

Oh, hey, great to hear from you! I really enjoyed your presentation at APConf, in 2019, on this!

wcerfgba · December 5, 2022, 12:57pm

I actually did some hacking on searx a while ago for a non-Fediverse purpose, but I think the same approach could be taken here. If searx had an engine or class for each type of Fediverse site (Mastodon/Pixelfed/…) then it would be possible to make a searx instance which could directly query a list of different sites and pool all the results together.

how · December 9, 2022, 1:50pm

Would it not be easier to have a single searx engine for ActivityPub-compliant servers and work implementation differences from there?

aschrijver · December 11, 2022, 6:38am

My thinking wrt search has changed over time. My thinking as new fedizen was “Yeah, federated search so you can find everything on the fedi”. No longer of that opinion.

Personal social networks

I’ve come to love an aspect of the fediverse that wasn’t immediately obvious to me: that on the fedi I was involved building my own “personal social network” by the social graph of followers I gradually handcrafted. And that for me the fact that this social graph is relatively small, means that it is ‘human scale’, manageable.

Why would I need to search in the microblogging behaviour of millions of fedizens? Mass Social Media is done by the walled garden platforms. I am doing social networking, which is what humans have been doing since the dawn of time and are now extending online.

So, I am happy and content that search on my instance is ‘restricted’ to only the social interactions I have had with others. The people I chose to engage with in the past.

I do not have FOMO to miss some discussion somewhere, and if I feel my social graph doesn’t cover a personal interest of mine, then I should put in the effort to improve my graph.

Personal knowledge networks

If you are working in a particular field or have certain interests, there is a need to delve in the broad body of knowledge that exists. I distinguish the knowledge network from the personal social network. The question is: should Microblogging be indexed for knowledge gathering? I personally think not.

For knowledge gathering I have the existing Web, and the search engines operating there. Now, beyond Microblogging there are other types of federated apps and there’s more to come. Blogs, wiki’s, open science tools, etc.

These I would like to be indexed and searchable. But that need not be part of my Microblogging apps and clients.

For me Info Overload is an everyday thing already. Adding full text search to entire fedi doesn’t solve it. It only increases it.

wcerfgba · December 12, 2022, 10:11am

@how Yes I agree, I would like to develop a specification for ActivityPub search which can be consistent across implementations / site types. But failing that I think per-type engines could fill in the gap for now?

@aschrijver Thanks for sharing your thinking and experiences. I agree that curating our networks and mitigating information overload are positive process and outcomes and I like how Fediverse can enable these. My angle on this comes mostly from a discovery stage: how do I find out who to connect with, based on my interests, if I can’t find those people easily at first? For example, I am big into jazz-funk, but this is not the most popular genre of music. If I could search timelines or profiles for hashtags or relevant keywords (“who’s been posting about Incognito?”) then that would help me to discover new people and grow my social graph.

how · December 26, 2022, 11:01pm

I fear going for per-type engine would send the wrong message and end up bringing more work for you instead of encouraging ActivityPub implementors to focus on compliance. If instead you focus on a generic search engine for specification compliant entries, it will work out of the box for any new compliant implementation instead of just one or two major implementations. It may not be as useful from the start because it would limit search to a few general cases, but then, as more people work they software towards compliance, results would fall into place.

WisTex · January 11, 2023, 11:32am

One challenge to search is the data storage needed for the index. The more you index, the more storage you need. And one challenge to federated search, in particular, is that you can’t do it in real time en masse because that would require querying hundreds or thousands of federated servers all at the same time. And querying thousands of servers at the same time would require some serious resources, both on the server making the query, but also on the servers answering the query.

So you would have situations where small community servers and instances don’t have enough resources to realistically index a large enough portion of the fediverse for their own members, and would crash if too many other servers started querying them at once.

Unless we can overcome these issues, then most likely search will be performed by intermediaries who can afford to index the thousands of fediverse servers that want to be indexed.

duyp · January 10, 2024, 7:36pm

I’m afraid I’m still of the opinion that a federated search is a necessity in the long run.

I’m thinking when everyone’s hopped on to the fediverse and there are no old web posts on anything anymore.

Because (I believe) that’s where the value of whatever you’re searching for arises from - people’s opinions. Opinions matter more than the facts (for better or for worse), at least in terms of gaining knowledge. Otherwise apart from wikipedia there should be no other site left. We wanna know what others are talking about the things we’re interested in.

E.g. how’s the new iPhone? I’m not gonna just trust an article found in the the old web. I’m gonna ask people who’ve used it. Better yet, search for those answers as they’ve most likely already been given.

And questions like “how do you solve this issue in the new iPhone?” will only be asked and answered reliably on social networking sites.

Now replace iPhone with something no one in your current graph knows about.

With your way, I would fist need to find all the fediverse instances relating to tech or iPhones, and then search? Sounds tedious for a simple search. And there’s no guarantee I’ll find the correct instances/people.

Right now I just search google with “reddit” keyword if I want opinions of redditors, which is frankly most of my searches, especially for stuff I’m not sure where to find authoritative knowledge from.

You gotta give people just an easy solution, not harder.

For now. If the ultimate goal is for everyone to be on the fediverse, none of those option would exist.

As merely an option I’m sure it can’t hurt much?

how · September 16, 2024, 2:19pm

NGI Search program to search the Fediverse for small servers

@mastodon is working on a new NGI Search grant to provide discovery to small Fediverse servers. Congratulations on your grant, and let the community game begin!

FAQ

Is this for Mastodon only?

On the contrary: This project aims to specify how discovery providers can work for all services based on ActivityPub.

angus · February 4, 2025, 10:38am

This has come up for the Discourse plugin.

Who’s the best person to talk to about this? @thisismissem thoughts?