A single post is somehow creating multiple topics with no replies

to: @devnull
cc: @angus


Any idea what’s going on here? It seems to be a different bug than the one where Discourse generates duplicate posts for a Create that later gets Announced back at it.

1 Like

Did it just start today?

Could be the changes I made to send type Conversation instead of OrderedCollection when you attempt to resolve a context.

But I’m not sure whether Discourse reads context.

It started 20 hours ago, if that means anything to you.

I’m also not sure what Discourse does here, but if anything, it ought to be limited to just checking the id

Hey guys, I’ll be able to get a better sense when @how or @aschrijver rebuilds the server.

I added dedicated AP log storage last week (including the raw json received). That was deployed here a few days ago. I need a rebuild to pick up a change that fixes a small bug with the log visualiser in the admin panel.

Once I can see what the plugin is actually receiving in this case, and in the case of double posts, I’ll be able to write some tests, recreate it and resolve any issues on Discourse’s end of things.

(please nobody delete or clean up those topics in the next 24-48 hours. It’ll help the debugging to keep them in place).

1 Like

@devnull @how @trwnh

Ok, I finally had the time to recreate this case in a spec.

The reason this happened is because:

  1. Discourse was unable to resolve a Conversation type for context.
  2. The error thrown in 1 was too nested to properly roll back the relevant transactions.

This will address the immediate issue

@devnull I haven’t been keeping up with the latest on context / collections etc. Could you briefly explain the thinking behind using “Conversation” instead of “OrderedCollection” in context. Is it meant to be a type of collection…or?

2 Likes

Thank you all for taking care of this!

As you can see I’m not super responsive here. Maybe we should figure a quick line for sysadmins.

sort of… indirectly… basically here’s what led up to this:

  • about 2 weeks ago, i realized a quirk of ActivityPub would make it impossible to Follow a Collection without potentially undesirable and unavoidable side effects, because giving a Collection an inbox and addressing it directly would not only attempt delivery to the Collection inbox, it would also attempt delivery to any inboxes of every single item in the collection! this happens only for Collection and its subtypes, and is a MUST in the spec. of course, a Follow on a Collection is not necessarily relevant to every item in that collection… but there’s no way to address only the collection or only its items. the way addressing and delivery are defined, both will necessarily happen. the full discussion of the issue is ongoing, but it can be found here: Delivering to Collections can be ambiguous · Issue #486 · w3c/activitypub · GitHub
  • in passing, i mentioned the above issue to @devnull as a reason that context maybe shouldn’t be directly a Collection, but instead be its own Object that has a defined relationship with a canonical Collection. we could call this contextual object a Conversation and we could refer to the canonical collection via a posts relation/property. this mirrors some other data models and is one potential solution for the weirdness of following a conversation represented by a Collection – basically make it something other than a Collection (so it can be addressed as an individual without also addressing its members), or otherwise extend Follow to be sent to the attributedTo actor if any (which is not a well defined pattern, unnecessarily introduces more indirection and more complexity in what an actor might receive, and could be recursive). in general, i am inclined to say that it’s easier to just change the type of the context object rather than to change the way follows work and introduce following non-actor objects. i am not sure about anything yet, and this whole issue requires a bit more thought.
  • based on the things i said in that private discussion, @devnull went ahead and committed a change that introduced Conversation and Conversation.posts along the lines of one of the potential solutions. i consider this premature, but hey, it led to this bug being discovered in the Discourse plugin, so i guess it wasn’t all bad!
4 Likes

Implementing the Conversation-type object allowed me to further my own agenda of being able to explicitly pull in topics by the context URL (which is what people would undoubtedly try to do) instead of by individual post URLs.

It was not a large change, and should the spec shift it should be fairly easy to keep up :slight_smile: — at least now you know what it looks like in practice as well as in theory!

4 Likes

Thanks for the explanations guys. I’ll add Conversation support to the Discourse plugin when I get a spare mo this week.

2 Likes

That makes your implementation incompatible with others, because context is expected to be a collection.

If you need to create a link between post and a topic, you can use a different property (topic?).

1 Like

context can be anything, not just a Collection. the important thing is that it has an id — and you can use this id to group related things together, similar to how you might query for all objects with a certain tag.

in any case, i would hold off on the Conversation type for a different reason, which is that it’s not defined anywhere yet, and i’m not 100% set on whether it’s a good idea to go forward with it. it’s currently the least problematic way forward, but maybe wait for a FEP or something and we can discuss this all in the ensuing discussion topic :sweat_smile:

1 Like

Ok. @devnull on balance, my preference is to hold off, but if you’re keen on going ahead now, I’ll also go ahead to maintain NodeBB <> Discourse compatibility. Thoughts?

1 Like

It seems to be getting worse, about ~15 identical topics have been created today.

And I still don’t understand why this change was made. Why break our past agreement about context being a collection, @devnull ?
Collection worked for everyone. Object adds an unnecessary layer of indirection, and doesn’t work for blogging platforms where “topics” may not exist.

I think context should remain a collection. Other features like “topics” and subscribing to conversations should be implemented independently. I mentioned a possible solution for the latter problem on GitHub, and will describe it in more detail here:

  • context property points to a collection
  • top-level post and/or context collection may also have an observer property, which points to an actor
  • this actor can be followed to receive conversation updates
  • it can also represent a topic

Example:

{
  "type": "Note"
  "context": {
    "type": "Collection",
    "items": []
  },
  "observer": {
    "type": "Application",
    "inbox": "..."
  }
}

What do you think?

There’s no intermediary between post and conversation collection. All components can be implemented independently. Only one new property. Compatible with ActivityPub spec.

2 Likes

@angus @silverpill I’m happy to roll back the change, especially if it’s causing issues with other software or expectations with where we’d eventually end up.

It’s in a v4.1 branch that is as yet unreleased.

4 Likes

i think Discourse AP should for now only use context.id, because this is the foundational usage (figuring out which conversation or topic a post should belong to). keep in mind that remote servers can use different URIs though – and some of those URIs might be equivalent to or aliases of your local URI, so it’s not as simple as a 1:1 match or missing. you need a slightly more robust strategy that accounts for multiple identifiers for the same topic.

the context.type should not be used for now, so it should not cause issues. whatever code path Discourse AP is reaching that is causing this duplication bug should not be reachable.

i think NodeBB should consider rolling back the change for now, but it’s ultimately not a huge issue, and if it’s causing issues in other projects then those issues should be fixed in other projects. requiring context to be a Collection would be like requiring audience/to/cc to be a Collection: not a good idea, and not founded in anything.

2 Likes

I think, on balance, this may make sense.

I’ve cleaned this up and have just deployed the fix here, so it won’t happen again in any event.

1 Like