Fediverse crawlers and security

From https://socialhub.network/t/fediverse-crawlers-and-security/637

I run a project called fediverse.space; a visualisation of mastodon/pleroma instances on the fediverse (with more server types to come).

Fediverse crawlers often rely on e.g. the /api/v1/instance/peers endpoint of the mastodon API. this endpoint lists all the other instances that are known to the server. a few weeks ago, tens of thousands of fake instances started appearing on these lists – on random subdomains of a single host.

the instance details endpoint ( /api/v1/instance ) of all of these fake instances contained fake data and an XSS attempt. Some crawlers were temporarily upset at the sudden spike in instances, but it didn’t cause any major outages and no directory that I know of was susceptible to the XSS attack.

This thread is mostly a note to anyone else interested in creating such a project – remember that all data in API endpoints should be considered untrusted. As the Fediverse grows, we’ll probably see more and more attempted attacks like this one. So two pieces of advice:

  • sanitize all data from API endpoints
  • perhaps block domains with a very large number of subdomains (but whitelist things like masto.host).