Representing images

koehn · April 20, 2020, 12:25am

I have C2S and S2S apps in development, and I’m working on a way to allow users to attach images to documents they send. The images will be stored in S3-compatible storage, so a URL will suffice to accessing them in AP. I’m planning on representing them using the Attachment structure. To help with performance and minimize bandwidth, I’d like to serve and show a preview of the images uploaded where appropriate. I would like my server to interoperate well with other servers, and the AP specification is so wide open it’s hard to know what kinds of messages my client should expect to have to display.

My question is about structuring images. For now, I’ve been thinking about a structure like this one (in JavaScript, but close enough to JSON for you to get the idea):

{
  type: 'Note',
  content: 'Hello, world!',
  attributedTo: 'https://me.example.com/actor',
  to: 'https://me.example.com/actor/followers',
  attachment: {
    type: 'Image',
    url: {
      type: "Link",
      href: 'https://images.example.com/image.jpeg',
      width: 5120,
      height: 2880,
      mediaType: 'image/jpeg',
      preview: {
        type: "Link",
        href: 'https://images.example.com/image-preview.jpeg',
        width: 512,
        height: 288,
        mediaType: 'image/jpeg'
      }
    }
  }
}

But I recognize that there are myriad other ways to represent the same data. The attachment could simply be a Link with a mediaType. It needn’t be an Image but could simply be a Document, as Mastodon does. The url needn’t be a single object, but an array of multiple links with different width and height from which the client should select. There needn’t be a URL at all, and the image could be encoded into the content field, which is actually the closest to the definition from the attachment spec:

Identifies a resource attached or related to an object that potentially requires special handling. The intent is to provide a model that is at least semantically similar to attachments in email.

I don’t want to develop my application to handle every possible representation it might encounter; I’d rather use my finite time for developing features that will actually be used. At the same point, I’d like to strike a reasonable balance of interoperability. The specification is so wide open as to be nearly useless in deciding what formats one should expect to support.

lanodan · April 20, 2020, 8:52am

For example an image with a description uploaded to Pleroma will look like this (and what it will look like in AP C2S because we transform every object):

{
  "mediaType": "image/jpeg",
  "name": "screenshot_mpv:Doctor Who (2005) - S07E06 - The Bells of St John (1080p BluRay x265 Panda)@00:00:42.000.jpg",
  "type": "Document",
  "url": "https://queer.hacktivis.me/media/6992382f-4bd0-4fc7-8a8d-362da615b19a/screenshot_mpv%3ADoctor%20Who%20%282005%29%20-%20S07E06%20-%20The%20Bells%20of%20St%20John%20%281080p%20BluRay%20x265%20Panda%29%4000%3A00%3A42.000.jpg"
}

And for Mastodon:

{
  "type": "Document",
  "mediaType": "image/png",
  "url": "https://files.mastodon.social/media_attachments/files/027/702/471/original/a770e4155dae9cad.png",
  "name": "a new flat and modern logo showing a blue-light blue sengi",
  "blurhash": "Uq0|dqf}Zeg5f_e.etf+d;e?f,e,etfkg1f7"
}

(blurhash is mastodon own thing to get blurry thumbnails, for example as a removable overlay on sensitive activities)

But the different ways of representing objects in ActivityPub doesn’t matters so much, most implementations will tend to copy what others are doing and otherwise use similar logic between each others (I think yours is similar to videos in Peertube).

I don’t want to develop my application to handle every possible representation it might encounter; I’d rather use my finite time for developing features that will actually be used. At the same point, I’d like to strike a reasonable balance of interoperability. The specification is so wide open as to be nearly useless in deciding what formats one should expect to support.

One way you could do is to look at what others have, for example the relevant part in pleroma is:

Code, under our copyright (AGPL-3.0-only): https://git.pleroma.social/pleroma/pleroma/-/blob/918a8094fc175ed71ccb7305d606fb0b221163f6/lib/pleroma/web/activity_pub/transmogrifier.ex#L214
Our fixtures are in test/fixtures, I guess you could grep -r attachment test/fixtures after cloning the repo.

And for the AP client, I think having a subset of ActivityPub like Pleroma does could be a good idea to avoid having to do so much code (also you should validate external data, having transformation of it makes it easier). We should probably have it documented at some point to help AP C2S clients so feel free to ask for it.

One thing which might be interesting to you for the AP C2S side of things is the uploadMedia endpoint, which we have added in Pleroma with AndStatus (client): ActivityPub C2S: How to upload images?

grishka · April 20, 2020, 6:00pm

I experimented with exposing multiple sizes and image formats in my AP objects, but ultimately settled on having a local LRU media cache for consistency.

JSON-LD allows most values to be arrays. I used that for profile pictures. So basically what you want is this:

"url": [
  {
    "type": "Link",
    "href": "https://example.com/image_orig.jpg",
    "width": 5120,
    "height": 2880,
    "mediaType": "image/jpeg"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_orig.webp",
    "width": 5120,
    "height": 2880,
    "mediaType": "image/webp"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_preview.jpg",
    "width": 512,
    "height": 288,
    "mediaType": "image/jpeg"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_preview.webp",
    "width": 512,
    "height": 288,
    "mediaType": "image/webp"
  },
  // ... more sizes and formats
]

This is flexible enough to support as many sizes and formats as you wish, as opposed to just one “preview”.

Sebastian · May 12, 2021, 7:16am

Why the heck "type": "Document" and not "type": "Image" –
the first lesson I learned when studying journalism was “be as specific as possible” –
is that different for protocols?

@koehn About “subsets”, the Conformance Section made very clear what is ActivityPub conformant (“the entirety”) and what is not. So, we need to decide for ourselves if we “copy” short-message-box-logics or if we are ActivityPub conformant.

For me it is frustrating enough now that I need to go up all the chain because of above examples
Image → Document to check the mediaType to know it is ActivityPub "type": "Image" …

Sebastian · May 12, 2021, 7:45am

Just want to mention too, that koehn’s example is perfect.
“Be liberal in what you accept and strict in what you send” –

So, I wonder if mastodon has ever read the specification for ActivityStreams Core where it reads
Care should be taken to not unduly overlap with or duplicate the existing Object types.

An Image is an Image is an Image …

This is what I wrote here, in @redaktor it is a srcset

It is also a bad decision to let mediaType decide over ActivityPub type like mastodon !
In the real world one ActivityPub "type":"image" can have multiple mediaType and this is super useful if you let the browser decide in the srcset, e.g. webp for modern or jpeg for old-school …

@koehn please note that the posted examples from mastodon and pleroma are imho invalid –
the specification clearly says for mediaType

When used on a Link, identifies the MIME media type of the referenced resource.

When used on an Object, identifies the MIME media type of the value of the content property. If not specified, the content property is assumed to contain text/html content.

Document is an Object but the examples try to describe the mediaType of url –
so: Your posted example is “correct” …