Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delivering to Collections can be ambiguous #486

Open
trwnh opened this issue Jan 10, 2025 · 17 comments
Open

Delivering to Collections can be ambiguous #486

trwnh opened this issue Jan 10, 2025 · 17 comments
Labels
Needs Primer Page Needs a page in the ActivityPub primer Next version Normative change, requires new version of spec Waiting for Commenter

Comments

@trwnh
Copy link

trwnh commented Jan 10, 2025

Reference

When objects are received in the outbox (for servers which support both Client to Server interactions and Server to Server Interactions), the server MUST target and deliver to:

  • The to, bto, cc, bcc or audience fields if their values are individuals or Collections owned by the actor.

An activity is delivered to its targets (which are actors) by first looking up the targets' inboxes and then posting the activity to those inboxes. Targets for delivery are determined by checking the ActivityStreams audience targeting; namely, the to, bto, cc, bcc, and audience fields of the activity.
The inbox is determined by first retrieving the target actor's JSON-LD representation and then looking up the inbox property. If a recipient is a Collection or OrderedCollection, then the server MUST dereference the collection (with the user's credentials) and discover inboxes for each item in the collection.

The issue

For a Collection that also has an inbox,

  • The server will deliver to the inbox
  • The server will expand the items and try to discover any inbox for each item

There is no way to deliver to just the Collection.inbox, because the collection expansion during delivery is a MUST. This can cause issues especially when some or all of the Collection's items don't have an inbox, or even when the items have an inbox but aren't intended recipients.

Examples

For example, say a Collection represents a feed in the same way one might construct an RSS or Atom feed. Delivering to this Collection might trigger additional behaviors, such as adding a post to the collection. For every post added to the collection, the subsequent delivery will have an additional undeliverable target to iterate through. A collection with n posts will cause the server to discover the 1 inbox of the collection itself, and then iterate through n items, which might either lack an inbox, or they might have an inbox which wasn't intended for delivery.

More concretely, consider the article https://csarven.ca/linked-data-notifications and a Collection that contains this Article. Both of these have an inbox.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://csarven.ca/linked-data-notifications",
  "type": [
    "Article",
    // ...
  ],
  "inbox": "https://linkedresearch.org/inbox/csarven.ca/linked-data-notifications/"
}
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://server.example/some-collection",
  "type": [
    "Collection",
    // ...
  ],
  "inbox": "https://inbox.example",
  "items": [
    "https://csarven.ca/linked-data-notifications",
    // ...
  ]
}

I construct an Activity and address it to the Collection:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Activity",
  "summary": "This is intended to notify the Collection that it needs to do something.",
  "to": "https://server.example/some-collection"
}

However, when I POST this activity to an ActivityPub outbox, the ActivityPub server will do two things when only one is desired:

  • POST to https://inbox.example (intended)
  • Expand collection items
    • POST to https://linkedresearch.org/inbox/csarven.ca/linked-data-notifications/ (undesirable, as this is notifying the article of an activity that was completely unrelated to it!)

Further analysis

This problem applies more generally to anything that by some protocol is expected to expand to something else, while it simultaneously has its own inbox. If a protocol for Group membership, addressing, delivery, distribution, etc. were to be specced out (as https://github.com/swicg/groups aims to do), then it would face the same issue if the Group had both an inbox and members and delivering servers were expected to expand any Group.members in the same way that they currently are required to expand Collection.items.

Also, there is a concern that the addressing properties are trying to do too much. In ActivityPub, we consider to, cc, audience, bto, bcc for delivery, but "delivery" means more than just "discover an inbox and POST to it". There are edge cases and special cases that dilute the utility of this mechanism, such as expanding collection items, or skipping the Public pseudo-collection (which is itself bugged, as #404 describes).

Something noteworthy about other specs like Web Access Control is that they make an explicit distinction between various forms of targeting, by separating what would have otherwise been a single property into specifically three different things:

  • agent, which is a list of individuals by their WebID
  • agentGroup, which is a list of groups that expand to member individuals and their WebID
  • agentClass, which is a list of vocabulary terms that represent conceptual categories that an agent might match
    • Agent is basically the same as what we call Public (no identity required)
    • AuthenticatedAgent is anyone who can be identified (e.g. with a WebID)

Resolution

In a primer page, this issue should be described.

In a next version, it might be worth considering a similar split to Web Access Control, because these really are different things, and concerns should be separated.

@evanp evanp added the Needs Primer Page Needs a page in the ActivityPub primer label Jan 10, 2025
@evanp
Copy link
Collaborator

evanp commented Jan 10, 2025

I think this issue is particular for the Collection and OrderedCollection types. As noted, because these types require expansion and delivery to their items, it's not possible to address activities to a Collection-type actor without that expansion. The workaround here is to use another type for representing a composite type, like a set or bag or series, that is not specifically called out for this behaviour. I think this would benefit from a primer page on addressing collections.

@ThisIsMissEm
Copy link

There's a very strong difference between addressing a collection (and the expansion of the collection to inboxes used for delivery) and POSTing an activity to a collection's inbox endpoint (in which case it's behavior is defined by the receiving server)

@evanp evanp added the Next version Normative change, requires new version of spec label Jan 10, 2025
@ThisIsMissEm
Copy link

In a Groups context, a Group Actor would probably have a members collection, if, and only-if you address the group's members collection directly, and not the Group Actor, would your server be responsible for expanding the Collection of members and sending to all their inboxes, otherwise, if you address the Group Actor, then the Group actor is responsible for sharing that activity with the Group members collection as necessary.

@trwnh
Copy link
Author

trwnh commented Jan 10, 2025

w/r/t Group protocol i am going to defer to swicg/groups discussion but the general pattern is that if something has an inbox AND if that same thing is an indirect standin for some other expansion, then you will run into the same issue for addressing and delivery.

you might be able to do it solely for access control, but that would require an access control mechanism separate from addressing (since addressing triggers delivery).

example possible flows:

  • to:[Collection] triggers a POST to Collection.inbox, and then by expansion triggers a POST to Collection.items[*].inbox
  • to:[Group, Group.members] triggers a POST to Group.inbox, and then Group is responsible for inbox-forwarding to its own Group.members

example impossible flows:

  • to:[Collection] triggers a POST to Collection.inbox, and then DOES NOT trigger a POST to Collection.items[*].inbox (because you MUST expand, and expansion triggers the undesired delivery)
  • to:[Group] triggers a POST to Group.inbox; if we then define that a Group should expand to Group.members during delivery, we replicate the same issue.

essentially the workaround for Group addressing would be to define that Group DOES NOT EXPAND to Group.members, and that you must explicitly address Group.members to trigger delivery (handled by inbox forwarding on behalf of the Group, but this creates a different issue where a Group might spoof private activities by its own members)

but the Collection has no such workaround, because Collection is defined to always expands to Collection.items (and items is just a JSON-LD set or JSON array, not itself a Collection)

the only way you could fix this issue is by revising the requirement to expand Collections in to/cc and make it apply to some other property that is defined to specifically expand Collections (or Groups, or whatever). this is what WAC did with agent / agentGroup / agentClass. do we need to do this for every single one of the addressing properties? for example audience -> audience / audienceGroup / audienceClass, but then also to -> to / toGroup / toClass, and also cc -> cc / ccGroup / ccClass (and so on for bto/bcc)? this seems unwieldy if taken to the extreme, but it might be needed. at the very least, tangent to #404 we would need at least one term explicitly defined as @type: @vocab instead of @type: @id (to enable Public to work correctly, alongside other potential classes)

@silverpill
Copy link

An activity is delivered to its targets (which are actors)

Activities are delivered to actors, the spec is very clear about it. Collection is not an actor, it is a different core type (by definition), and collections with inboxes shouldn't be treated as actors.

@trwnh
Copy link
Author

trwnh commented Jan 11, 2025

@silverpill The spec is also clear that anything can be an actor. https://www.w3.org/TR/activitypub/#actors

An AP Actor is defined as "MUST have inbox and outbox". It is explicitly called out that it is not required to be one of the 5 AS2 types.

Regardless of whether you consider a Collection with an inbox to be an "actor" or not (by whatever definition), the issue remains. Would it help you to consider a Collection as a "resource" instead of an "actor"? It makes no difference -- it remains impossible to address (to/cc/audience/bto/bcc) such a "Collection with an inbox" in any Activity posted to a C2S outbox without also inadvertently triggering delivery for inboxes on any/all items.


At present, this also makes it impossible to Follow a Collection, because you can't send the Follow to that Collection without also potentially sending it to thousands of unrelated resources that also have inboxes. Or at the very least, if not sending to thousands of unrelated inboxes, then checking thousands of unrelated resources for inboxes that may or may not be there (and are unintended recipients in either case).

Given all this, it is clear that Collections are in some way straddling the line between "resource"/"object" and "view"/"presentation". You can refer to Collections as resources by their id, but they might not be fully real/reified. In particular, the reference to the Collection in the addressing properties is an indirection that MUST be unpacked. This means that Collections are not fully real with respect to addressing: they are both "real" (wrt inbox) and "not real" (wrt iterating over items).

@silverpill
Copy link

silverpill commented Jan 13, 2025

The spec is also clear that anything can be an actor.

No, it is not clear. The next big section after 4. Actors is 5. Collections, implying that they are different things. This interpretation is consistent with ActivityStreams 2.0 which defines different core types and doesn't say that anything can be anything. Existing ActivityPub implementations also don't mix core types.

The real problem that should be addressed here is a missing definition of an actor in ActivityPub spec. Currently it is too vague and hints at the possibility of nonsensical objects such as collection-actors, activity-actors and link-actors, leading to issues like this one.

@trwnh
Copy link
Author

trwnh commented Jan 13, 2025

The next big section after 4. Actors is 5. Collections, implying that they are different things

"Objects" gets its own section, which doesn't imply anything about Actors/Activities/Collections not being Objects.

Existing ActivityPub implementations also don't mix core types.

Note that in https://www.w3.org/TR/activitystreams-core/#model there is no "Actor" core type. But as https://www.w3.org/TR/activitypub/#actors notes,

ActivityPub actors are generally one of the ActivityStreams Actor Types, but they don't have to be. For example, a Profile object might be used as an actor, or a type from an ActivityStreams extension. Actors are retrieved like any other Object in ActivityPub. Like other ActivityStreams objects, actors have an id, which is a URI.

Actor objects MUST have, in addition to the properties mandated by 3.1 Object Identifiers, the following properties:

  • inbox: [...]
  • outbox: [...]

So ActivityPub is clear that actors are just objects that have id, inbox, outbox. We might consider whether it makes conceptual sense for something to carry out an activity or not, but there is not a limit nor a prohibition on any specific type. The only disjoint relation is that an Object cannot also be a Link, or vice versa -- and actors are Objects. But aside from that, you could make anything an actor. Whether it is a good idea to make certain things into actors is a separate concern.

Outside of ActivityPub, inbox is a property that can be applied to any resource that accepts Linked Data Notifications. And the way that the ActivityPub delivery sections are written, delivery will target any entity addressed as long as it has an inbox. This is fine on its own. It's the additional requirement to expand Collection addressees that can be problematic.

Ultimately, addressing properties like to, cc, audience and so on are not required to denote actors, they denote entities or individuals https://www.w3.org/TR/activitypub/#outbox-delivery -- specifically "things that have an inbox" or "things whose type includes Collection". But there happens to be an undesirable interaction that arises for the union of these two things.

@nightpool
Copy link
Collaborator

I agree with (afaict) everybody else in this thread that using a Collection as an actor is probably a bad idea. I see the impulse but I don't think it's the right road to go down.

@trwnh
Copy link
Author

trwnh commented Jan 14, 2025

It might be a bad idea right now, but what's the solution here? My thinking is that either:

  • we define a special property whose semantics explicitly include expansion (similar to the agent vs agentGroup vs agentClass split in WAC)
  • or we rethink or rework Collections themselves such that in the data model they are somehow not considered objects or resources?
    • ...so should they be redefined to be views? this seems murky... RDF doesn't really have the concept of a "view", it only really has Resources and Literals.
    • RDFS introduces Containers (such as unordered Bag or ordered Seq or option-based Alt), which describe membership in a potentially infinite set, and Collections (such as a linked List), which describes a finite grouping of elements and lets you traverse between elements until terminated.
      • it's not clear if AS2 Collection can be mapped onto RDFS Collection, since RDFS Collection has more structure, but it seems possible (although perhaps inconvenient).
    • even with all this, the issue remains that you cannot notify a Collection resource via AP addressing without additional undesired behavior. so we should warn about that undesired behavior.
  • or we ...?

I understand that the likely intent of including Collections in the addressing properties was to enable features like addressing the Public collection or addressing specific followers collections, but there's a problem with just doing that at face value -- and in another spec like WAC, this would be where agentClass is useful for specifying Public in the same way they specify Agent or AuthenticatedAgent. Or with the concept of a "Group" in basically any other spec (vcard:Group, foaf:Group, ...), you have a member relation that lets you specify who's in the Group, and then properties like agentGroup explicitly expand to the members (in the same way AP addressing currently expands a Collection to its items, but distinct from regular inbox selection). But in AP/AS2, we don't really make this distinction. Hence this issue.


Unfortunately I don't really know how to proceed thinking about this without clarity on what exactly a Collection is -- the underlying data model, if any. Also, when should we be indirect? When should this indirection apply? Because taking the followers collection example again, we currently have a lot of gaps with semantic closure when it comes to the Collection concept. By claiming a relation between an Object and a Collection, we are never actually relating the Object to the items of that Collection at all!

In other words, given the following:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://domain.example/~alice/",
  "type": "Person",
  "followers": {
    "id": "https://domain.example/~alice/followers/",
    "type": "Collection",
    "items": ["https://domain.example/~bob", "https://domain.example/~charlie"]
  }
}

We actually cannot conclude anything about any possible relations between ~alice and ~bob, or between ~alice and ~charlie. Sure, we can establish application logic to post-process this and determine that ~alice is followed by ~bob and that ~alice is followed by ~charlie, but we cannot directly infer this based on the provided information alone.

If our data model instead looked like this:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://domain.example/~alice/",
  "type": "Person",
  "IsFollowedBy": ["https://domain.example/~bob", "https://domain.example/~charlie"]
}

...then we could make that conclusion.

A further complication is that looking solely at a Collection, it is not possible to immediately know what purpose that Collection serves; for example, we don't know that an addressed Collection is specifically a followers Collection, nor do we know whose followers Collection it is. As far as we're aware, it's just an independent Object like any other.

Consequently, this makes handling addressees which happen to be Collections more difficult than it needs to be. And in the intended use case where the Collection is actually an indirection for addressing its items, then this means that any servers involved need to recognize which of the addressed IRIs are indeed Collections. (This may not be public information.)

All this makes me wonder if there's a better way... Some unambiguous mechanism such that:

  • A certain property's semantics and accompanying behaviors are clearly defined and expected
    • For example, always expand all values, don't just expand some values; values of a given property should either always expand or never expand
  • An expansion algorithm is deterministic and consistent
    • For example, given a Collection, always return every single item in that Collection; or given a Group, always return every single member of that Group
  • The expanded individuals are considered part of the audience, but the grouping resource is not
    • In other words, there is no ambiguity between the resource itself versus its indirect expansion; the grouping resource is not part of the audience unless otherwise included in a different property that doesn't expand

The polymorphism here seems to be the root of the issue. Or rather, the inconsistency of behavior that arises from that polymorphism.

@silverpill
Copy link

Note that in https://www.w3.org/TR/activitystreams-core/#model there is no "Actor" core type.

@trwnh Looks like it was not included by mistake. I opened an issue: w3c/activitystreams#633

or we rethink or rework Collections themselves such that in the data model they are somehow not considered objects or resources?
...so should they be redefined to be views?

In practice, collections usually are views, so I think such re-definition may make sense.

@trwnh
Copy link
Author

trwnh commented Jan 23, 2025

@silverpill i still think that an AS2 "Actor" and an AP "actor" are not the same thing.

regardless, the issue is about outbox delivery to associated inboxes, so a concrete Actor type doesn't actually solve anything. in general, you can deliver to any resource with an ldp:inbox, but if the resource is typed as as:Collection, it will trigger additional deliveries which may be unwanted.

making as:Collection uniquely subvert the delivery model feels like a bug that should be fixed. it shouldn't "magically" expand. at most, i think that this expansion should be limited to a property that specifically always expands, and as described above, this expansion should be deterministic and consistent, similar to how WAC disambiguates between agent/agentGroup/agentClass.

breaking it down into RDF triples, you basically have certain triples which mean different things than other triples of the same form... which is not a good way to model information:

@base <https://thing.example/> .
@prefix as: <https://www.w3.org/ns/activitystreams#> .

<> a as:Activity .
<> as:audience <https://alice.example> .  # this delivers to alice
<> as:audience <https://bob.example> .  # this delivers to bob
<> as:audience <https://collection.example> .  # you have to detect this as a Collection, which isn't explicitly stated anywhere and therefore you need to know this somehow via external knowledge

now compare with using a different property:

@base <https://thing.example/> .
@prefix as: <https://www.w3.org/ns/activitystreams#> .

<> a as:Activity .
<> as:audience <https://alice.example> .  # this delivers to alice
<> as:audience <https://bob.example> .  # this delivers to bob
<> as:audienceIncludesCollectionItems <https://collection.example> .  # this can be inferred to be a Collection, especially if the property's range is defined to be Collection

in the first example, we don't actually know that https://collection.example is a Collection that needs to be expanded; we need to therefore check some external knowledge source (perhaps an indexed database) to determine which if any of the referenced resources are indeed Collections that need to be expanded. but note that this check MUST happen for every single referent. this means we need to check if https://alice.example is a Collection, and if https://bob.example is a Collection.

but in the second example, we actually do know that https://collection.example is a Collection that needs to be expanded, because we defined that ahead-of-time and have come to a semantic agreement. consequently, we therefore no longer need to check anything beyond the inbox discovery process for https://alice.example, https://bob.example, and the contents of https://collection.example (but not https://collection.example itself -- if we wanted to deliver to https://collection.example, we would address it as a regular resource).

@evanp
Copy link
Collaborator

evanp commented Jan 24, 2025

One thing discussed in triage was making a future update to the spec with these changes:

  1. Better explanation of how Collection and its ilk are special in this regard.
  2. Some other mechanism for delivery to a Collection only, without delivering to its members.

@trwnh
Copy link
Author

trwnh commented Jan 24, 2025

User stories

At request of Evan in issue triage, to cover user stories...

The main user story is "following a conversation".

The naive interpretation is that a conversation can be represented by a Collection, but this approach makes it unclear where to send a Follow and have it be handled.

The main workaround right now is to instead differentiate between the conversation and its posts; we might do this with a Conversation type that has a property called posts that points to a(n Ordered)Collection of posts within the conversation, and the Conversation object can then be given actor properties like followers, inboxand/oroutbox. The inbox would make it addressable as an AP actor without the undesirable expansion to member items, which are no longer expressed via itemsdirectly but are instead expressed via theposts` property which is an indirection layer between the Conversation and the Collection.

More generally, a Collection is a resource like any other, and like any other resource, it can have an inbox property at least according to Linked Data Notifications (LDN -- see #470 tangentially). If we wish to notify a Collection resource of something, we can do so using "raw LDN" by making an HTTP POST request to the ldp:inbox, but we cannot do so using AP delivery via making an HTTP POST request to the as:outbox (without triggering the undesired behavior). We can generally say that using AP as an addressing or multi-delivery mechanism layered on top of LDN is almost possible... except for Collection resources having unintended side effects. I think if this were to be pursued further by anyone, it would make more sense to define something other than as:outbox, which doesn't have this behavioral requirement... but this is out of scope for ActivityPub and more in scope of some hypothetical "LDN proxy" which just takes an HTTP request and forwards it to certain targets on your behalf. But I don't know if this is particularly interesting since it amounts to more or less just being a plain old HTTP proxy with a slight variation.


Conclusions

For conclusions, I think the main takeaway here is that we should consider a Primer and some language in Next Version along the following lines:

  • "Be wary of giving a Collection an inbox. Due to the way that ActivityPub delivery works, it is not possible to deliver to a Collection resource without also attempting delivery to all items in the Collection. Addressing a Collection with an inbox will both discover that inbox and also dereference the collection to discover inboxes for each item in the collection."

I'm not sure to what extent it makes sense to consider new types or new properties in a Next Version, but if there was appetite for that then we could consider those at a later point.

@silverpill
Copy link

The main user story is "following a conversation"

I can think of two good options:

  1. Create a separate actor and attach it to a "context" collection or to a top-level post, and set it as object of Follow. A new property would be needed to attach such actor (something like Collection.observer).
  2. Set object of Follow to "context" collection or to a top-level posts, send Follow to attributedTo's inbox (some applications already do this, including Friendica).

These options are compatible with existing implementations, and don't require major changes in specs.

@trwnh
Copy link
Author

trwnh commented Jan 26, 2025

@silverpill "You can send a Follow of any object to its attributedTo" is a major change to the spec. Who's to say that this is supported? We could try to spec out this kind of flow where actors might be responsible for managing updates to their owned objects (how much can the server help with here?), but "You can send a Follow to anything with inbox and followers" is the ideal here. It's far more direct, and the indirection of crawling up attributedTo introduces ambiguity and uncertainty -- how far up the chain do you crawl? 1 level? 2 levels? Until you find something with an inbox? We would need a much clearer framework on how actors manage their owned objects or how they might spawn child actors in a hierarchy. That's a lot of theoretical stuff that needs to be thought out and specced out! Instead, it would be far simpler to just give everything an inbox and if you want it to support the Follow flow then give it a followers collection. The question is if this simpler model is expressive enough to handle advanced use cases, or if the extra complexity of introducing a hierarchy is necessary.

Consequently, as I detailed in the rest of my previous comment beyond the first sentence, the model I am toying with right now is to represent a conversation not by a Collection but by an Object, and to instead associate a Collection with that Object via some property. I'll not repeat myself on the particulars, but this is all that can be really done because Collection is/remains unique with respect to the outbox delivery algorithm and its discovery of inboxes. The only way to treat a Collection with an inbox is to deliver to both the Collection's inbox and to the Collection's items' inboxes. I've already argued the particulars of this too as well, about 2 comments ago on this issue -- and how the problem was solved in other specs by having different vocab terms for different types of values: agents, groups, and classes are distinct authorization targets in WAC, with agent being a direct relation to agents, agentGroup being an indirect relation to a Group's members, and agentClass being a logical relation to anything matching the definition of a class). This is tangential as well to:

@silverpill
Copy link

"You can send a Follow of any object to its attributedTo" is a major change to the spec. Who's to say that this is supported?

This is what FEPs are for. I don't understand what you mean by ambiguity and uncertainty and why would anyone need to travel across multiple levels to get an actor.
The real problem with this particular solution (number 2 in my previous comment) is that according to AP, Follow should be used with actors:

The Follow activity is used to subscribe to the activities of another actor.

https://www.w3.org/TR/activitypub/#follow-activity-outbox

To work around this limitation, a different activity type can be used, e.g. ThreadSubscribe. Or we can use solution number 1 (separate collection and actor), which I am in favor of anyway.

the model I am toying with right now is to represent a conversation not by a Collection but by an Object, and to instead associate a Collection with that Object via some property.

Sounds similar to solution 1, except you want to use Object instead of Actor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Primer Page Needs a page in the ActivityPub primer Next version Normative change, requires new version of spec Waiting for Commenter
Projects
None yet
Development

No branches or pull requests

5 participants