Representing images

I have C2S and S2S apps in development, and I’m working on a way to allow users to attach images to documents they send. The images will be stored in S3-compatible storage, so a URL will suffice to accessing them in AP. I’m planning on representing them using the Attachment structure. To help with performance and minimize bandwidth, I’d like to serve and show a preview of the images uploaded where appropriate. I would like my server to interoperate well with other servers, and the AP specification is so wide open it’s hard to know what kinds of messages my client should expect to have to display.

My question is about structuring images. For now, I’ve been thinking about a structure like this one (in JavaScript, but close enough to JSON for you to get the idea):

{
  type: 'Note',
  content: 'Hello, world!',
  attributedTo: 'https://me.example.com/actor',
  to: 'https://me.example.com/actor/followers',
  attachment: {
    type: 'Image',
    url: {
      type: "Link",
      href: 'https://images.example.com/image.jpeg',
      width: 5120,
      height: 2880,
      mediaType: 'image/jpeg',
      preview: {
        type: "Link",
        href: 'https://images.example.com/image-preview.jpeg',
        width: 512,
        height: 288,
        mediaType: 'image/jpeg'
      }
    }
  }
}

But I recognize that there are myriad other ways to represent the same data. The attachment could simply be a Link with a mediaType. It needn’t be an Image but could simply be a Document, as Mastodon does. The url needn’t be a single object, but an array of multiple links with different width and height from which the client should select. There needn’t be a URL at all, and the image could be encoded into the content field, which is actually the closest to the definition from the attachment spec:

Identifies a resource attached or related to an object that potentially requires special handling. The intent is to provide a model that is at least semantically similar to attachments in email.

I don’t want to develop my application to handle every possible representation it might encounter; I’d rather use my finite time for developing features that will actually be used. At the same point, I’d like to strike a reasonable balance of interoperability. The specification is so wide open as to be nearly useless in deciding what formats one should expect to support.

For example an image with a description uploaded to Pleroma will look like this (and what it will look like in AP C2S because we transform every object):

{
  "mediaType": "image/jpeg",
  "name": "screenshot_mpv:Doctor Who (2005) - S07E06 - The Bells of St John (1080p BluRay x265 Panda)@00:00:42.000.jpg",
  "type": "Document",
  "url": "https://queer.hacktivis.me/media/6992382f-4bd0-4fc7-8a8d-362da615b19a/screenshot_mpv%3ADoctor%20Who%20%282005%29%20-%20S07E06%20-%20The%20Bells%20of%20St%20John%20%281080p%20BluRay%20x265%20Panda%29%4000%3A00%3A42.000.jpg"
}

And for Mastodon:

{
  "type": "Document",
  "mediaType": "image/png",
  "url": "https://files.mastodon.social/media_attachments/files/027/702/471/original/a770e4155dae9cad.png",
  "name": "a new flat and modern logo showing a blue-light blue sengi",
  "blurhash": "Uq0|dqf}Zeg5f_e.etf+d;e?f,e,etfkg1f7"
}

(blurhash is mastodon own thing to get blurry thumbnails, for example as a removable overlay on sensitive activities)

But the different ways of representing objects in ActivityPub doesn’t matters so much, most implementations will tend to copy what others are doing and otherwise use similar logic between each others (I think yours is similar to videos in Peertube).

I don’t want to develop my application to handle every possible representation it might encounter; I’d rather use my finite time for developing features that will actually be used. At the same point, I’d like to strike a reasonable balance of interoperability. The specification is so wide open as to be nearly useless in deciding what formats one should expect to support.

One way you could do is to look at what others have, for example the relevant part in pleroma is:

And for the AP client, I think having a subset of ActivityPub like Pleroma does could be a good idea to avoid having to do so much code (also you should validate external data, having transformation of it makes it easier). We should probably have it documented at some point to help AP C2S clients so feel free to ask for it.

One thing which might be interesting to you for the AP C2S side of things is the uploadMedia endpoint, which we have added in Pleroma with AndStatus (client): ActivityPub C2S: How to upload images?

2 Likes

I experimented with exposing multiple sizes and image formats in my AP objects, but ultimately settled on having a local LRU media cache for consistency.

JSON-LD allows most values to be arrays. I used that for profile pictures. So basically what you want is this:

"url": [
  {
    "type": "Link",
    "href": "https://example.com/image_orig.jpg",
    "width": 5120,
    "height": 2880,
    "mediaType": "image/jpeg"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_orig.webp",
    "width": 5120,
    "height": 2880,
    "mediaType": "image/webp"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_preview.jpg",
    "width": 512,
    "height": 288,
    "mediaType": "image/jpeg"
  },
  {
    "type": "Link",
    "href": "https://example.com/image_preview.webp",
    "width": 512,
    "height": 288,
    "mediaType": "image/webp"
  },
  // ... more sizes and formats
]

This is flexible enough to support as many sizes and formats as you wish, as opposed to just one “preview”.