Ok, another wall of text from me. Sorry! Strap in…
Mastodon’s defaults on fetching and caching seem whacky to me. The admin of the server I’m on says the default is to download all text/image posts, and then delete it all after a week.
Including posts that @mention me. Even ones that are part of the context for ongoing conversations. Even ones I haven’t even seen. Even the Direct posts that @mention me.
Why?
My dream defaults are quite different.
Yes, there will be a bit more latency for them. One obvious benefit is never downloading images for posts nobody ever sees. Reducing storage costs for the receiving server, and bandwidth use at both ends.
A less obvious benefit is that a server never ends up storing dodgy media (potentially illegal stuff like CSAM), without someone using their server seeing it. If it is viewed, and therefore downloaded, either the person who viewed it will report it for purging (and further action as appropriate), or mods can hold them responsible for not reporting it.
That’s exactly what Mastodon’s doing by downloading images before anyone wants to look at them. Potentially wasting piles of storage. Which is then addressed by aggressively pruning text data people on the server will want to see. Text that uses a fraction of the per-post data of space used by image data. To me, this seems whacky, ‘cheery Londoners dancing on roofs with brooms’ level whacky.
What you’re optimising for with DOFV is precisely the images people do want to look at. As evidenced by the fact that somone has. With a whole lot less media flying around the network unnecessarily, the overall latency of everything would likely improve.
This may surprise you, but not everyone looks at every post made by everyone their account follows. I follow anyone who posts stuff I like. I look at very little of it. But if I do a search in my app, I’m more likely to find good stuff (or for specific searches, any stuff at all). As is anyone else using the same server.
At least we were, before Mastodon started stuffing it all down the memory hole after a week. Our admin pushed out auto-pruning to a month after I queried what’s going on. But we’re still losing our conversation history on a daily basis.
In which case wouldn’t there be latency for the first person viewing them anyway?
How many were never viewed at all? Wasting significant resources in a totally avoidable way (see above). Now multiply that by every server in the network. Now imagine the network being 10 times bigger in 5 years, and 100 times bigger in 10 (just as a hypothetical).
The waste of storage and bandwidth scales up with growth of the network. Convince me this is good design.
Ok firstly, this seems like a purely theoretical risk. Most people who use social media walk around with a computer in our pockets. How sensitive is data about the times it’s in our hands instead? In XMPP networks, telegraphing presence is a feature, not a bug
Also, as you say, this is really only an issue for single-account servers. If this is part of their threat model, they can just change the default to suit their needs.
But in general, people using single-account servers are even more likely to follow more stuff than they’ll ever look at. To increase the scope of discovery (see the linked Fediverse Ideas issue about my dream defaults). So unless they have a generous budget for storage, or they don’t pay for it (eg server in the closet), a DOFV default for anything beyond post metadata and text still makes more sense to me.
Seoondly, I think we need to watch out for prioritising a theoretical risk over solving a clear and present danger; making running small-to-medium fediverse server so expensive, it becomes unaffordable for most communities and organisations to even consider hosting their own. That’s a big part of how the DataFarmers ate the web in the first place.
This is a totally different situation, involving interaction between the fediverse and the document web. Also AFAIK the Mastodon policy on link previews wasn’t DOFV, it was DOEV (Download On Every Viewing). Thus slashdotting any non-fediverse site linked in a post that went viral across the verse.
The solution they went with was, as you say, to change the style of DOEV…
So Mastodon traffic is still caning the webservers hosting people’s blogs etc, avoidably consuming their resources. Just slowly enough not to DDoS them off the web. Moving to DOFV would actually solve the probiem.
I’m not convinved that Mastodon is optimised at all. See;