Would Media captioning make a good FEP?

When media such as images and video are uploaded and shared without captions, it excludes the visually impaired from knowing what the content is.

To address this, some user interfaces highlight such media so that people can be more easily made aware of this deficiency in what they may share. People can then reply to ask for a captioned repost, or decide whether they want to share or otherwise engage with such content.

The question is, would formalization of captioning make sense as a FEP? This area seems to be one that could use another look.

Some references as to current state of documentation:
ActivityPub which links to
SocialCG/ActivityPub/MediaUpload - W3C Wiki

1 Like

I am not sure if I understand it correct.
The question is if it adds anything new (people can already reply to ask for a captioned repost and I hope they do so …)

At yesterdays meeting we thought about creating like topic based “Best Practices” spaces here and
the same as for Groups could be for Media captioning.

What is frustrating here is that I basically have all of my professional documentaries, films and photos captioned already in the files.
In XMP and DC/IPTC/NewsML …
In the other window I am working on a metadata parser (util.ts but exif and IPTC also exists in other repos) because I think, it should be the task of any posting UI to highlight the need of it.

In redaktor when you upload media, you will then be asked if you want to use default metadata as template.
Which should in the end include

for All [e.g. Page, Place, Event]

  • meta tags and DC
  • og and twitter
  • JSON-LD schema.org by means of schema:mainEntityOfPage
  • mf2

for Image + XMP/IPTC/EXIF
for Video + XMP
for Audio + ID3

The parser can be tested simply with util.js, add any URL in the end,
cd to the directory
node ./util

An example output of the last URL used is in /example.json which is a SPIEGEL news article.
About the attachment property, for now only as: elements or videos are copied over here but this page has none.
But any additional elements are in the ld property (for now), so any implementation can choose.


You make some good points. Replying to ask for captioned repost is something I’ve though of but I know it doesn’t scale so tend not to.

If I can summarize from your post, an accessibility FEP could specify a standard for:

  • What kind of metadata might be extracted to pre-fill captions
  • How those captions are presented
  • What to do if captions aren’t available, and if that’s return an error code, what the error should say

Since you mentioned automated interaction with metadata, I just thought of this which might be appropriate for a privacy-focused FEP:

  • What kind of metadata should be stripped (i.e. location or overly granular time information)

Yes, I would like to have tags like “protocol”, “experimental” and “best practice” to know directly what the FEP is about.

  • What kind of metadata should be stripped (i.e. location or overly granular time information)

Well, I think it is the advantage that we can let the user decide.
Let me explain:

The default in redaktor is that by default all metadata is stripped if the user did not opt-in.
All requests in redaktor go through a proxy server.
For content-type image/* (by default only jpg, png, webp, avif, gif, heif) it is using under the hood sharp [proxy can also resize etc.] and the withMetadata option would only be active if the requesting user wants it.
In the posting window, the user can then choose what metadata to use or not.

Personally I do not post so much photos on mastodon cause the square but this would make publishing my stories much easier.
Most important might be the message before users post, like “Your alt text is empty. You fail.” :slight_smile:

Please also note that the redaktor client also uses the natural language possibilities in ActivityPub for different natural languages i18n support. But i18n is probably out of scope.

(Currently in the other window doing a light version of the proxy as a “building block”-middleware, e.g for the express servers @datatitian https://github.com/immers-space/activitypub-express and @darius https://github.com/immers-space/activitypub-express)

Something else comes to my mind.
We need to specify what “caption” means.

There are basically two kinds of captions:

  1. Speakable: The alt attribute of e.g. an <img> or a text below depending on the “alt decision tree”
  2. Visible: A caption below the image as often seen in news media

Both have totally different meanings.
Let’s assume the Actor is a journalist or media organisation.
Then 2. is essential.
Let’s assume the Actor is an artist posting an abstract piece.
This actor probably wants to avoid 2. otherwise he would have become a writer …

We somehow could define what should be 1 and what 2 …

A quick tip regarding captions in mastodon

1 Like

We’ve merged this CSS into two themes that are available by default in Ecko and are looking at making it something like a checkbox instead of a special theme. I believe there’s a compose-time warning in Glitch as well. I suppose a FEP could be a UX recommendation at compose and display time.

Rather than divide the cases by kind I would rather go back to the intent of captioning, which is to map the content into linear text so that it can be consumed by screenreaders, or in the case of transcripts for the hearing-impaired, by reading.

Many videos have captions but many do not. Most every image on the web does not have a caption and my experience with podcasts is maybe 5% have transcripts.

Much of today’s information is a mystery to the blind and hearing impaired. The fediverse being newer provides an opportunity to make accessibility a core part of our systems. It’s clearer to me now that this is a valid problem and a FEP can’t hurt, so I’ll start working on a draft.

1 Like

So there is much code written inbetween which could then help.
A main problem when developing all the @redaktor widgets were hearing impaired because the major browsers do not support WebVTT for the Audio Element which is sad.
So, for the as Audio type widget, we have an overlay of a video element doing WebVTT if there are any tracks.
There is a topic with screenshots (early stage) Seeking opinions on time-based content but now for time based “extra content”, it’s most easy is to supply WebVTT Metadata as ActivityPub markup to add time based ActivityPub content.

Then I wrote the image proxy and put it in a gist but just sent it to @datatitian for quick review cause it could be a simple plugin router for https://github.com/immers-space/activitypub-express and I want to prevent any bugs before …

Will let you know about progress …

Forgot to mention about the proxy (can send the gist link via DM), it is currently handling all Namespaces and Vocabulary in https://developer.adobe.com/xmp/docs/XMPNamespaces/
and converting it to ActivityStreams Vocab.
You can simply chain multiple image operations from sharp and /withMetadata
Then via content-negotiation and Accept-header outputs either the image with metadata embedded or plain json or ld+json

Privacy is important to me.
The thing should never be used by bots automatically creating ActivityPub objects.
It should be used by the Actor as an opt-in template for posting.
In any case the posting interface should proactively notify the Actor about missing captions or the minimal a11y needs and further guidelines.
In general, metadata should fall into 3 categories:

Critical Metadata – what should always be deleted …

  • EXIF Maker Notes
    they contain often encrypted very private and forensic data, like the temperature of the camera or the encrypted email/Account data where “encyrypted” is relative, the forementioned examples (temperature/email) are possible to read for a beginner after 2 days playing and other hacks are known by search engines :wink:

Metadata which was not produced by the author

  • EXIF
  • TIFF
    in my code, the users can opt-in but data is filtered and lands in the instrument property of ActivityStreams

Metadata which was produced by the author

  • IPTC
  • DC
  • photoshop
    [which is just a namespace to bridge IPTC/DC and photoshop or other apps (e.g. iView) specific values]

@weex and all

There is a supernice workshop by EBU Tech, wikidata and IPTC about what I describe above.
In 6 days, March 10 …

The invitation by the European Broadcasting Union reads:
“Wikidata has become one of the largest collections of open data on the web.

Join our workshop (10 Mar), held with IPTC, on how broadcasters and media organizations can use Wikidata to tag, enrich and enhance their content! #opendata with wikidata and IPTC”

European Broadcasting Union is the Place, where all the public broadcasters of Europe are working together …

1 Like

I am adding this important perspective here

Please note: If we specify any ActivityPub “flows” and some software does not support it, we could simply send the icon which Erik mentions and use it as a fallback.

1 Like

To follow up on that toot a bit. Point made was that for accessibility there must be captions, but people can be unable (e.g. because of different disabilities) to write them, as Erik describes in:

Disabled people know that access needs can clash. I benefit from described images, but I know some people struggle to write them because of their own disabilities.

And he mentioned workarounds that currently exist, like adding a custom emoji for people to indicate they want their image inclusion to be captioned. For this you can also mention the Gup.pe group @imagecaptionspls, and with both emoji and group mention missing, someone can reply with a mention of the @PleaseCaption bot.

(I always caption my images, and also always forget these account names, but that’s why we need a more thorough built-in handling of captions)