Traversing the reply chain when working with topics

angus · May 3, 2024, 7:46am

I’ve just been cleaning up threadiverse-wg@socialhub.activitypub.rocks as it had a number of orphaned replies that had been turned into topics.

The immediate fix for this is to discard a Note if I don’t have the Note it’s in reply to (instead of creating a new topic) (PR for that is here), however I know that some implementations try to “walk up” reply chain.

I guess I’m thinking out loud here as to how much reply chain walking makes sense in a forum context. I mean ideally we have a collection in a context property to work with, and the Discourse plugin actually checks this already, but we don’t live an ideal world. This is what I’m currently thinking for reply chain walking:

Go back N number of replies (perhaps 5) to see if there is a Note already associated with an existing post.
If we find a Note (say at the 4th iteration) we import ALL of the intervening Notes, and add ALL notes as new posts in the relevant topic.
So we’d end up with 5 new posts in the existing topic in this example.

Curious what others think on this.

trwnh · May 3, 2024, 7:51am

seems reasonable enough, at least in the absence of a clear signal that posts should be grouped together (i.e. context), you end up having to handle contextless posts somewhat differently. walking up the reply chain to an arbitrary depth is one way to do that. i’m guessing at least 3, but 5 would probably work too.

silverpill · May 3, 2024, 12:09pm

It looks like in most cases the parent object was present and was a Note, but ActivityPub plugin failed to fetch or parse it.
In particular, replies from Streams are always breaking the thread.

angus · May 3, 2024, 12:49pm

Yeah, I didn’t mean to suggest the inReplyTo object wasn’t available.

Just that if it’s not already present locally, or it doesn’t have a context, the plugin is currently creating a new topic with that Note. Hence the need to traverse the reply chain to determine if there’s an existing topic to put it in.

I guess the point is here that reply chain traversing in a forum is more specifically for the point of topic detection and curation (making sure the right Notes end up in the right Topics). A post cannot exist outside of the context of a topic like it can on social media.

silverpill · May 3, 2024, 1:45pm

What implementations that don’t support groups can do to improve interop? Should we copy the value of a context property from a parent Note?

julian · May 3, 2024, 2:52pm

Specifically, NodeBB doesn't provide a context right now because there was movement on that area and I simply opted to wait.

I could provide a resolvable context. Right now if you query a NodeBB topic, it returns as:Page, but if Discourse can handle an OrderedCollection, I will do that ASAP.

Especially if @trwnh@mastodon.social can confirm that that's roughly the direction they're thinking of as well.

julian · May 3, 2024, 2:55pm

@angus@socialhub.activitypub.rocks, specific to (streams), I discovered last week with @mikedev@fediversity.site's help that his content was failing NodeBB's key ownership checks because (streams) uses the equal sign in their HTTP signature values.

I was naively doing a .split('='); and accidentally discarded part of the URL that requested it return a public key

trwnh · May 3, 2024, 3:01pm

to summarize 7888 yet again (:P)

you can set the same context. this is the equivalent of posting in the same topic.
you can set a new context. this is the equivalent of starting a new topic.
you can set no context. this is the equivalent of having no topic.

the caveat is that there are some posts that have no context/topic but are still intended to be in the same context/topic. it’s up to implementers to decide which heuristics they want to use for implicit inclusion. inReplyTo chains is one way.

julian · May 3, 2024, 3:11pm

The thing is, different implementors might opt to set the different contexts for remote content. That's where I think things might get tricky.

For example, if SocialHub has a context socialhub.com/context/1 with posts socialhub.com/post/1, socialhub.com/post/2, socialhub.com/post/3

And NodeBB receives them all, the post ids would remain the same, but the context might be updated to nodebb.com/context/39

I haven't quite thought it through but that would suggest that I would need to maintain a mapping of remote contexts to my own.

trwnh · May 3, 2024, 3:16pm

the assumption is that a context/topic is owned by only one server, but it is possible to e.g. use alsoKnownAs or aliases or similar in order to assign multiple identifiers to the same collection.

i realize the “peering agreement” bits are still not written down, but you can have the context collections follow each other. that way, they should stay in sync (assuming no delivery failures).

julian · May 3, 2024, 3:22pm

So in that case would I be incorrect in changing the context to the local NodeBB context collection?

I was thinking specifically of this line in 7888:

You MAY set your own context, if you wish for your object to be in a separate context owned by you.

Which I read as "I can set my own context in parallel", but I realize now you might've meant for that to read "separate context in the case of topic fork/split"

... but maybe you did mean the former....

trwnh · May 3, 2024, 6:17pm

i meant the latter but you could probably do the former if there were a mechanism to link together equivalent contexts as aliases of each other. for now, the easiest thing to do would be to just copy the “authoritative” one by whoever created the thread.

julian · May 3, 2024, 6:47pm

@trwnh@socialhub.activitypub.rocks NodeBB now supplies context with every as:Note object, and is resolvable as an OrderedCollection.

One thing that is not currently done is what we talked about here, inheriting the authoritative context and serving that instead. I will need to think that through a bit more.

angus · May 9, 2024, 3:23pm

The Discourse plugin will implement reply chain traversal for the purpose of topic detection when this is merged:

github.com/discourse/discourse-activity-pub

Add context resolver

discourse:main ← angusmcleod:add_reply_chain_traversing

opened 03:21PM - 09 May 24 UTC

angusmcleod

+548 -58

@pmusaraj This resolves the issue described here: https://socialhub.activitypub.…rocks/t/traversing-the-reply-chain-when-working-with-topics/4187 In short, if the plugin receives a Note inReplyTo a remote Note, we will resolve up to 3 replies to see if there is a Note in the chain in an existing topic, in which case we'll convert the intervening replies, and the new reply, into posts in the existing topic.

Essentially it implements the following (but with a limit of 3 instead of 5.

If you’re curious about the detail see the ContextResolver spec: spec/lib/discourse_activity_pub/context_resolver_spec.rb

julian · May 9, 2024, 3:27pm

@angus@socialhub.activitypub.rocks may I ask why you add a limit to the traversal logic?

I can see an argument made against doing so if it locks up the process, but the downside is you'd still have some cases where you don't get the full context.

Either way this may be moot if an iterable context is found, so inReplyTo traversal is ideal as a fallback mechanism.

Edit: in NodeBB's case, we call an internal recursive method called getParentChain which just makes the S2S call and adds it to a Set. The method terminates when it encounters an object with no inReplyTo or is unprocessable.

angus · May 9, 2024, 3:37pm

The honest answer is that a limit makes some intuitive sense to me, but I have medium to low confidence in the cogency of my thinking on both the limit and where it’s set. I’ve set it at 3 as that seems to be the more “conservative” (read “safer”) approach while I think it through further / see how this first version works in practice.

In terms of the “risks” (to the extent they exist) I think I’m thinking a version of the following:

You could be sent a random Note inReplyTo an unrelated Note that’s part of a large chain which you end up traversing for no reason.
Even if you eventually get to a Note in an existing topic, say 20 replies in, is it still right to say that those replies are part of your topic in a coherent sense? In what scenario would you be missing 20 odd replies? Perhaps there is one.

angus · May 9, 2024, 4:15pm

I guess one of the things I’m assuming is that other services are implementing the Inbox Forwarding spec correctly, which would mean that, in an ideal world, you should already have the replies you should have anyway and this is more of a “stop gap”.

However, I note that Mastodon violates the spec here, which means that more replies from Mastodon might be missed than is ideal

github.com/mastodon/mastodon

Submit ActivityPub implementation report

opened 07:10PM - 08 Nov 17 UTC

closed 07:49PM - 08 Apr 18 UTC

cwebber

activitypub

Sorry for abusing the issue tracker on this a bit, but ActivityPub is coming out… of the last steps and it would really really really help to get Mastodon on the [implementation reports page](https://activitypub.rocks/implementation-report/). Luckily, this is very easy to do! All you have to do is run Mastodon through the [test suite](https://test.activitypub.rocks/) and it will generate a report for you which you can submit to the [ActivityPub issue tracker](https://github.com/w3c/activitypub/issues). In the future the test suite will be fully automated; currently it is only automated for the client-to-server protocol. However it will ask you some questions and ask you to observe behavior and verify output. Anyway, you just need to do that and check some boxes and horray! You have something to submit. Thank you thank you thank you!

julian · May 9, 2024, 4:26pm

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

more replies from Mastodon might be missed than is ideal

You are not incorrect. In practice the following situation happens occasionally, especially in larger/busy topics:

You post a reply to a topic/thread (branch A), but a different branch (B) of the topic occurs outside of your view (since the activities are not forwarded to you)
Later on, someone you do follow replies in branch B, and you receive it.
Traversal finds 20 posts in between you missed, and they are all added at once, and you receive the notification of new posts in the topic, except now all of the "new" posts are scattered throughout the linear flow
- Additionally, some of these new posts might appear in places higher up than where you last read

So this violates the assumption (at least in NodeBB) that if you have a "read up to" point in a topic, that there will not be new content above that point.

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

is it still right to say that those replies are part of your topic in a coherent sense?

From a purely technical point of view, yes, they are part of the same context (at least as derived via reply chain traversal), but from a UX POV, you could make that argument.

A forum with a linear flow of posts tends to diverge less often due to the nature of the presentation of posts themselves; something threaded models don't need to contend with.

julian · May 9, 2024, 4:29pm

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

You could be sent a random Note inReplyTo an unrelated Note that's part of a large chain which you end up traversing for no reason.

Another legitimate concern. My counter is that traversing the chain is rather inexpensive: XHR => (do other things while waiting) => inReplyTo? XHR... etc.

Actual note processing is done only once the chain is complete, and a positive relation is found.

... but I can see how this could lock up the process in other languages where processing literally stops when waiting for the XHR to complete.

angus · May 9, 2024, 5:05pm

Yeah, I agree with you on both points. It’s similarly inexpensive to make 20 requests in Discourse (it’s in a background process on a seperate thread).

On reflection I think part of my conservatism here is that I don’t like that Mastodon doesn’t forward activities properly and I don’t like starting from a position that I need to import 20 notes that Mastodon failed to forward to me It causes similar metadata issues in Discourse.