Cross server/federation search implementation

Hi,

I wonder how a cross server or a cross federation search could be implemented. Let’s take an example: imagine a reddit-like federated app. When the user search for a community (in a text search input for instance), it would be great if the results communities would be both the communities on the current server and the communities on the other federated servers.

Concerning the implementation, I see 2 possible implementations:

  1. A centralised implementation. In this case, in addition to the reddit-like federated servers, there will be something like a “common registry”. If we assume that each community is designed by an ActivityPub actor, then the registry would just “record” the association between the community name and the actor URL. However, this solution come with some drawbacks. The first one is that a centralised solution is antonym to a federated solution (this is specifically what we want to avoid). What’s more, some server administrator who don’t want to join the registry could always create alternate registries and that lead to registries fragmentation. If registries are fragmented, they become useless.

  2. A decentralised solution. I imagine this solution much more like a web search engine (google for instance) but whose goal is to find ActivityPub actors. The search engine just crawl the federated servers and records the actors URL. This sounds like a good solution but I don’t know how it can be implemented. How can the search engine find the federated servers ? How can the search engine find the actors on the servers ? We can imagine a lot of solutions, for instance having a sitemap-like which references the actors on a server, etc… But the best thing would be to have a specification that describes exactly what to do !

Then my question… are there already some existing implementation/trial/research/specification on this subject ? What kind of implementation would be the best one for you ?

Thanks for reading.

[2019-11-27 10:09:42+0000] ayorosmage via SocialHub:

I wonder how a cross server or a cross federation search could be implemented. Let’s take an example: imagine a reddit-like federated app. When the user search for a community (in a text search input for instance), it would be great if the results communities would be both the communities on the current server and the communities on the other federated servers.

Concerning the implementation, I see 2 possible implementations:

  1. A centralised implementation. In this case, in addition to the reddit-like federated servers, there will be something like a “common registry”. If we assume that each community is designed by an ActivityPub actor, then the registry would just “record” the association between the community name and the actor URL. However, this solution come with some drawbacks. The first one is that a centralised solution is antonym to a federated solution (this is specifically what we want to avoid). What’s more, some server administrator who don’t want to join the registry could always create alternate registries and that lead to registries fragmentation. If registries are fragmented, they become useless.

A centralised implementation already exists: it’s your preferred search engine(s) (if you have one), one way could be to try to get a bit more friendly towards search engines but AFAIK Mastodon and Pleroma (even with the JS-only frontend) are friendly enough.

  1. A decentralised solution. I imagine this solution much more like a web search engine (google for instance) but whose goal is to find ActivityPub actors. The search engine just crawl the federated servers and records the actors URL. This sounds like a good solution but I don’t know how it can be implemented. How can the search engine find the federated servers ? How can the search engine find the actors on the servers ? We can imagine a lot of solutions, for instance having a sitemap-like which references the actors on a server, etc… But the best thing would be to have a specification that describes exactly what to do !
    Then my question… are there already some existing implementation/trial/research/specification on this subject ? What kind of implementation would be the best one for you ?

This idea awfully reminds me of a Proof-of-Concept (which I don’t have the url to right now but it was basically aggregation of multiple instances search) done against pleroma’s search right now, but the thing was that basically a public search allows to grab everything someone has posted publicly, including deleted posts (because you can grab a post without any identification).

So I think this is a completely wrong way to try to fix the issue of searching on the fediverse in a large enough way. And I think if we manage to implement the decentralised hashtag federation presented at Prague’s APConf it could be much better than search, specially as hashtags are opt-in.

1 Like

Thank you for your answer.

I watched the video of Prague’s APConf (which can be found here: https://redaktor.me/apconf/). Very interesting btw.
A fully decentralised implementation doesn’t seem to be for tomorrow. What’s more, the presentation is about the hashtags. A fuzzy search solution would be event harder.