Partially embedded objects

silverpill · August 12, 2024, 11:06am

Sometimes object is not embedded fully and only a few of its properties are included. One example of this practice can be found in FEP-400e which requires objects on a wall to have a target property that contains an “abbreviated collection object”.

The mechanism of partial embeddings can be useful, but it is currently underspecified. How consuming implementation can tell if the object is full or abbreviated? I think it can be done using a new property, partial.

Example:

{
  "id": "https://social.example/announce",
  "type": "Announce",
  "object": {
    "id": "https://social.example/like",
    "type": "Like"
    "partial": true
  }
}

This FEP-1b12 activity is easy to parse because the type of object is known. At the same time, there’s no ambiguity regarding full / partial embedding. The recipient of this activity would know that object must be fetched.

tesaguri · August 12, 2024, 11:37pm

Well, if you go strictly by the open-world assumption, every object is considered partially embedded by default. The obvious example of this is IRI string values for "@type": "@id" terms, which are mere syntax sugar for { "@id": "https://example.com/foo" }. But yes, that’s not quite a practical interpretation.

I think the problem is applicable to any JSON-LD use cases. So I wonder if there is a prior art for this by the JSON-LD people.

helge · December 25, 2024, 6:37pm

Question: Does any implementation currently use this? How well does it interoperate?

silverpill · December 25, 2024, 7:23pm

I don’t think anybody uses this.
It probably doesn’t interoperate well, and servers may save a partial representation to a local cache leading to all kinds of issues.

I shall write a FEP about this sometime.

silverpill · December 29, 2024, 7:32pm

Another use case for partials: Undo(Follow). With a regular Undo you need to insert the exact Follow activity ID. However, due to network issues remote server may end up with different activity ID in its cache, and may be unable to properly process Undo. There are two common answers to this problem (1, 2):

Store IDs of all received follow activities, not only the last one.
If Follow is embedded, ignore its id and try to process the Undo anyway.

None of that is ideal. But we can use a partial representation to solve the problem:

{
  "id": "https://bob.example/activities/undo",
  "type": "Undo",
  "actor": "https://bob.example/actors/bob",
  "object": {
    "type": "Follow",
    "object": "https://alice.example/actors/alice",
    "partial": true
  }
}

Here the sender explicitly tells the recipient: “Undo any/all Follow activities with a given object”.

trwnh · December 29, 2024, 8:52pm

echoing what tesaguri said above in that all representations are partial – so i don’t see the point of partial: true.

instead, what needs to be resolved is the nature of that node’s identity. broadly, you can identify a thing based on its immutable attributes, or you can give it a name/label/reference/etc that remains consistent even when its attributes change. i think this is similar to what DDD refers to as “intrinsic and extrinsic identity”, or what you might otherwise think of as “values vs entities”.

for example:

literal values do not need identifiers, because they can be identified by the literal value itself.
on the other hand, a natural person does not necessarily change identities when their age changes. someone who is 20 years old is still the “same as” themselves when they become 21 years old. they can also change other aspects of themselves without altering their fundamental identity.

but it is possible for complex structures to exist, which are themselves composite values. for example, we might say a Person is a struct/type/class/etc that has attributes like age, weight, eye color, and so on. we then have to determine which of these attributes might uniquely identify the instance. for example, we might say that it is possible to uniquely identify a USCitizen by the tuple of (date_of_birth, social_security_number). in this example, we assert that no two instances of USCitizen will ever share both the same DOB and same SSN.

so when it comes to a Follow activity, the fundamental question being asked is: does some external identity exist?

if you send two Follow activities and then Undo only one of them, you need to establish (by some protocol) how to uniquely identify a Follow. so for example, you can say that a Follow can be uniquely identified by the tuple (actor, type, object). if you say this, then you don’t actually need an id! but if for some reason you don’t think there is an identity equivalence here (like in the case where multiple Follow might be allowed), then you do need an id.

it stands to reason that if you think a Follow is uniquely identified by (actor, type, object), then the logically consistent behavior is to not store or use any id in processing such activities. it’s completely superfluous, unneeded and unused. you shouldn’t be parsing it or checking it in any way.

if you do include an id in any way, then it should indicate that this specific instance of a Follow is unique.

to illustrate: are these Follows unique? that is, are they the same Follow or different Follows?

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "@graph": [
    {"actor": "https://alice.example", "type": "Follow", "object": "https://bob.example"},
    {"actor": "https://bob.example", "type": "Accept", "object": {"actor": "https://alice.example", "type": "Follow", "object": "https://bob.example"}}
    {"actor": "https://alice.example", "type": "Undo", "object": {"actor": "https://alice.example", "type": "Follow", "object": "https://bob.example"}}
  ]
}

well, in an RDF graph sense, they are different – the lack of an identifier just means that they will be assigned a blank node identifier. the first item in the graph can be referred to as _:b0, while the Accept is _:b1 and its inner Follow is _:b2, and the Undo is _:b3 while its inner Follow is _:b4. of course _:b0 and _:b2 and _:b4 are not string matches of each other, so they are different identifiers and therefore different things.

but in the sense of the Follow protocol (either de jure per AP or de facto as implemented in the fediverse), you could argue that they are the same. after all, they have the same side effects, and are idempotent. if received in the order above, the resulting state would be that Alice is no longer following (or requesting to follow) Bob. this is because we have declared the protocol-level identity of a Follow activity to be equivalent to the tuple (actor, type, object) as described above.

so again, the question comes down to this: how does your protocol identify a Follow activity? because it doesn’t make sense to say that a Follow activity is both unique and not unique. either it is unique (and does not need an id) or it is not unique (and it does need an id).

ditto for Like, Announce, etc. – the AP spec doesn’t strictly declare these to be uniquely identifiable by shape. just as there are systems which might interpret them in the singular (only one Like per actor, only one Announce per actor), there are equally as valid systems which might interpret them in the multiplicative (multiple Likes per actor, multiple Announces per actor).

it’s along these lines that i think things such as the liked collection on actors doesn’t make sense as currently described. at the very least, the side effects for an Undo Like should not blindly remove an object from the liked collection, but rather, the server should check to see if the final remaining Like is being undone, or in other words, that there are no Like activities remaining. however, there is no good way to determine this, so it would be easier to deprecate the liked collection and instead define a collection that contains any/all Like activities that haven’t been Undone yet. the only alternative is to decree across the entire network that an actor can only Like an object once… and on whose authority are you going to decree this? that’s something you can never guarantee.

in other words:

this should probably always be done… assuming consensus that Follows are uniquely identified by (actor, type, object). and a minimal Undo with no embedded Follow should never be sent. but you should be prepared to handle this case if it does happen (by storing any IDs you do encounter).

helge · December 30, 2024, 8:02am

I would not call the usages here as partial embedding, but as a reference. So I would suggest using something like:

{
  "id": "https://social.example/announce",
  "type": "Announce",
  "object": {
    "type": "Reference",
    "referredId": "https://social.example/like",
    "referredType": "Like"
  }
}

This also allows me to claim that the rule

type => how to parse

remains true. With partial one would be back to having to distinguish two cases on how to parse everything (horrible).

Unfortunately, even with reference the above rule would still be wrong for stuff like Accept and Undo, but at least it’s enough to investigate the object and its referredType. That’s why I still have a preference for Unfollow and RejectReply.

helge · December 30, 2024, 9:04am

See also

for other people working on an alternative to Accept activity of a Follow request.

stevebate · December 30, 2024, 9:06am

I think the most reasonable approach (or maybe even a “best practice”) to implementing the side-effects for and Undo/Follow (and similar activities) is to ignore the object’s id, if any. Major AP implementations, like Mastodon, don’t store activities and they don’t use the id in the Undo (and don’t required it, AFAICT). AP doesn’t require that the original Follow (or Like, etc.) activity had an id either. It could be “transient”, although the required side-effects of processing it are persistent.

I don’t see any examples of Undo in the AP specification. The AS2 Vocab specification shows examples without an Undo object id in the document. It also states:

The Undo activity type is defined to provide the specific ability to undo or cancel out a prior activity. The appropriate interpretation for the following, then, is that Sally liked John’s note at one point but has explicitly redacted that like later on.

Based on that, it seems the appropriate interpretation of an AP Undo/Follow is that “Actor1 followed Actor2 at one point but has explicitly redacted that follow.” This can be interpreted (and side-effects performed) purely in terms of the following relationship, if any, between Actor1 and Actor2, and independently of activity identifiers.

stevebate · December 30, 2024, 9:16am

What do you mean by “parsing”? Typically, a JSON parser will do the parsing to a tree data structure. A JSON parser can parse any of these representations equally well. Are you using the word “parsing” to mean how the code interprets that parsed tree structure and implements side-effects? Or are you thinking of some kind of internal graph->object or graph->table mapping or ???.

If “parse” means “performing side-effect behavior”, then I think your proposed rule is overly simplified since the required side effects will sometimes be determined by multiple activity types (e.g., Undo/Follow, Undo/Like, Undo/Block, Undo/Accept/Follow…).

I agree. The semantics of a partial attribute are muddled, at best, given the OWA. However, I can see how there may be issues in some cases if servers publish subsets of known properties for an object. For example, a server might not send the same information for an authenticated request than for an anonymous one. Mastodon did this at point for actors and it caused problems when servers cached only the restricted actor data.

I wonder if that could have been addressed with a “no cache” directive for the restricted data? I sometimes think AP is a (bad) social graph synchronization algorithm that doesn’t know that’s what it is (and that’s why it’s “bad”).

silverpill · December 30, 2024, 9:47pm

A full representation can be used as is.
A partial representation can only be used for processing “hints”, and for everything else consumer would need to fetch the full representation from the origin.

It means Follow activity is special, I think this is a bad protocol design. That path leads to an ever-expanding list of special entities in the spec with lots of “ifs” and “thens”. Instead, I want to develop a single pattern that can be applied in many different situations.

I’m not particularly happy with partial: true, but it would do the job.

silverpill · December 30, 2024, 9:58pm

Could you elaborate? With your solution one still needs to be able to parse Announce(type: Like) and Announce(referredType: Like).

All things being equal, I would prefer partial: true because it doesn’t require defining a new referredX property for every possible property of a referenced object.

silverpill · December 30, 2024, 10:12pm

This is a nice temporary workaround, but again it is bad protocol design. If id is included, we should respect publisher’s intent and perform the described state transition. If we don’t respect the intent, this is not protocol but a mess.

I think “transient” activities shouldn’t exist. All independent (non-anonymous) objects should be retrievable from their origins, for authentication purposes.

trwnh · December 30, 2024, 10:17pm

why not say that you can always use any representation as-is, provided that you trust it? if you’re missing information, then you can try to get more information, e.g. by fetching from your next level of cache or origin.

what job does it do, though? “signal that you should fetch more information”, even though you should basically always do this if the provided information is not enough? or are you instead trying to describe a shape constraint?

this is for the most part unavoidable i think – in a generic sense you can Follow the same object multiple times but in a practical sense this might not make much sense. this is where the protocol ideally steps in to clarify that you can deduplicate Follows or make them idempotent. for example, if you get a Follow and you Accept it and then you get another Follow from the same actor for the same object, can you respond with the same Accept or should you generate a new one? if this is “bad protocol design” (and i don’t fully disagree with you), then what is the problem and what might a better solution look like?

of what use is the id? what “intent” does its inclusion signal?

activities are also objects, and there’s no difference between a “transient” activity and an “anonymous” activity.

stevebate · December 31, 2024, 5:44am

The intent of an Undo/Follow is to remove the following relationship, if any. No id is required to implement this side-effect if the Follow object URI is included in the activity serialization. The spec doesn’t require an id and AS2 examples for Undo are consistent with not requiring an id. Therefore, requiring only the Follow object is the better protocol design,

Transient activities make sense for at least some of the interaction activities. The important requirement for those activities are the side-effects rather than the presence of an activity id. It would be more radical, but I think one could make an argument that even activities like Create, could be transient if clients had access to Object timelines instead of activity inboxes (effectively what Mastodon does in an non-AP way).

silverpill · January 4, 2025, 8:21pm

Often it is not possible to know what is missing. Therefore some indicator is needed, like partial: true.

If id of Undo.object is specified, the publisher tells recipients that she undoes the exact activity identified by that id.
If id is not specified, then it can be interpreted as undoing of all activities that have given shape.

I’m using these terms in the same way as ActivityPub spec:

transient: “short lived activities that are not intended to be able to be looked up, such as some kinds of chat messages or game notifications”. I think this is wrong because all activities must be retrievable. Retrieving by ID is the primary authentication method in ActivityPub, all other methods are built on top of it.
anonymous: “An ID explicitly specified as the JSON null object, which implies an anonymous object (a part of its parent context)”. Nobody knows what “parent context” means, but the only interpretation that makes sense is that anonymous objects are simply embedded objects without ID. Those are widely used in Fediverse: attachments and tags.

trwnh · January 4, 2025, 11:28pm

It is never possible to know what is missing.

i meant, why would the publisher ever give the original Follow an id?

there are exceptions to this, and this “transient” language is meant to apply to those exceptions. for example, imagine negotiating a transport session after which you could deliver activities without having to authenticate each individual payload; perhaps authentication might be done out-of-bound, for example via HTTP headers (as in how signatures are currently used). it’s entirely possible to design a protocol such that retrieving a certain activity is impossible, even while the activity is still valid. if you were distributing a presence indicator like “alice is currently typing…” then it would be ridiculous to expect a requirement that such activities be persisted. if you were playing a game, do you really expect all activities from that game session to persist outside that session?

like, maybe it makes sense to say that all published activities must be retrievable and therefore fall under the “MUST have publicly dereferenceable id” rule. but these are not the only types of activities possible.

this is an error that is being removed via errata.

what do you mean, “nobody knows”??? it’s clear to me, at least, especially given that this is a parenthetical meant to clarify the definition of “anonymous object”. in this sense, the “parent context” is similar to a “link context” in web linking – it’s “whatever you are currently considering”. at the top level, you have a general requirement to give that activity an ID unless it qualifies for some exception. but you are not required to use id anywhere else in the document. the “context” for your entry point would be the Undo, which has Undo.id, but once you enter the “child context” by navigating to Undo.object, you are now considering a different object, because you have crossed over to a different node. this node doesn’t need an id because it is part of its “parent context”, which is that of the document itself, unless you override it by setting an explicit id.

so your interpretation is basically correct, but the premise of this being some unknown concept is not.

for a top-level Activity that is the entry point of the document, if you wanted it to be transient, you would remove its id, making it anonymous. Therefore, in this case, “transient” and “anonymous” are equivalent.

this is one interpretation, yes. but more generally, this applies to cases where there is some alternative way to identify the activity by protocol. “given shape” is one such way. this works when the shape is enough to uniquely determine the semantics.