Skip to content

Batch policy evaluation for bulk authz callers #17

@hardbyte

Description

@hardbyte

PermissionChecker::evaluate_access is per-call: one subject, one resource, one action, one context. Authorization paths that load N resources from a backing store and need to return the visible subset currently have two bad options:

  1. Call evaluate_access N times in a loop. This is correct: every policy in the checker is respected, including policies added later. But a single logical list/batch decision produces N evaluations, N traces, and potentially N async policy calls even when subject and action are invariant across the batch.
  2. Push the policy into SQL or another backing predicate language. This can be faster, but it duplicates today's policy shape outside gatehouse. A later policy added to the checker can be silently bypassed by the bulk path while still firing in single-resource calls. That loses the central-policy guarantee gatehouse is supposed to provide.

This shows up naturally in list/scope/batch endpoints that need to answer "which of these resources can this caller see?". A multi-resource subscription handler, for example, might load a scope of ids and then need to authorize each before returning data. Today callers either pay the N-fan-out overhead or escape the policy abstraction.

Goal

Add a first-class batch evaluation path that:

  • preserves the same per-item OR-across-policies semantics as evaluate_access;
  • keeps resource/context construction owned by the caller;
  • preserves input ordering in returned results;
  • provides one batch-level trace/span with summary counts;
  • avoids encouraging ad hoc SQL-side policy duplication;
  • lets policy/resolver implementations opt into real backend batching when they can actually exploit it.

Proposed caller API

Prefer an item-oriented API over tuple-shaped AsRef<(R, C)> inputs or id-specific mapper helpers. The caller usually already has domain objects, rows, or loaded resources; gatehouse should just ask how to borrow the resource and context for each item.

impl<S, R, A, C> PermissionChecker<S, R, A, C> {
    pub async fn evaluate_batch_by<I, F>(
        &self,
        subject: &S,
        action: &A,
        items: I,
        parts: F,
    ) -> Vec<(I::Item, AccessEvaluation)>
    where
        I: IntoIterator,
        F: for<'item> Fn(&'item I::Item) -> (&'item R, &'item C),
    {
        /* evaluate each item with the same decision semantics as evaluate_access */
    }

    pub async fn filter_authorized_by<I, F>(
        &self,
        subject: &S,
        action: &A,
        items: I,
        parts: F,
    ) -> Vec<I::Item>
    where
        I: IntoIterator,
        F: for<'item> Fn(&'item I::Item) -> (&'item R, &'item C),
    {
        /* return only items whose evaluation is granted */
    }
}

Example usage:

let visible_posts = checker
    .filter_authorized_by(&user, &Action::View, posts, |post| {
        (post, &request_context)
    })
    .await;

This keeps the common "candidate rows/items in, authorized rows/items out" workflow simple without making gatehouse responsible for fetching resources or constructing contexts.

Actual bulk backend design

A checker-level helper can reduce caller boilerplate and telemetry volume, but it cannot make arbitrary policy implementations do fewer backend calls. For real bulk execution, the checker needs an object-safe policy-level batch hook with a default implementation that falls back to per-item evaluation.

Sketch:

pub struct PolicyBatchItem<'a, R, C> {
    pub resource: &'a R,
    pub context: &'a C,
}

#[async_trait]
pub trait Policy<Subject, Resource, Action, Context>: Send + Sync {
    async fn evaluate_access(
        &self,
        subject: &Subject,
        action: &Action,
        resource: &Resource,
        context: &Context,
    ) -> PolicyEvalResult;

    async fn evaluate_access_batch<'item>(
        &self,
        subject: &'item Subject,
        action: &'item Action,
        items: &'item [PolicyBatchItem<'item, Resource, Context>],
    ) -> Vec<PolicyEvalResult> {
        let mut results = Vec::with_capacity(items.len());
        for item in items {
            results.push(
                self.evaluate_access(subject, action, item.resource, item.context)
                    .await,
            );
        }
        results
    }

    fn policy_type(&self) -> String;
}

PermissionChecker::evaluate_batch_by can then invert the loop order:

  1. Convert caller items into borrowed (resource, context) batch items.
  2. Keep all item indices pending initially.
  3. For each policy in checker order, call policy.evaluate_access_batch(...) once with the still-pending items.
  4. Append each returned PolicyEvalResult to that item's trace.
  5. If a result grants, finalize that item and remove it from the pending set, preserving OR short-circuiting per item.
  6. After all policies, remaining pending items are denied with the same "All policies denied access" shape as evaluate_access.

This preserves policy order and per-item short-circuiting, but it deliberately does not preserve the exact global interleaving of today's naive loop. Instead of evaluating item1/policy1, item1/policy2, then item2/policy1, a true bulk backend evaluates policy1 across all pending items, then policy2 across the items not yet granted. Policy implementations should not rely on cross-item side effects or global evaluation order.

The method contract should require evaluate_access_batch to return exactly one result per input item in the same order. If an implementation violates that, gatehouse should fail closed for the affected policy/items rather than accidentally granting.

SQL example: N queries today

For a SQL-backed ReBAC-style policy, imagine an ACL table like:

CREATE TABLE post_grants (
    tenant_id uuid NOT NULL,
    subject_id uuid NOT NULL,
    post_id uuid NOT NULL,
    action text NOT NULL,
    PRIMARY KEY (tenant_id, subject_id, post_id, action)
);

A naive per-resource resolver does this query N times:

SELECT EXISTS (
    SELECT 1
    FROM post_grants
    WHERE tenant_id = $1
      AND subject_id = $2
      AND action = $3
      AND post_id = $4
) AS allowed;

For a list endpoint with 100 candidate posts, that is 100 round trips or 100 database executions hidden under 100 evaluate_access calls.

SQL example: one bulk query

A bulk resolver can accept all candidate resource IDs and map results back to input order with WITH ORDINALITY:

WITH candidate_posts AS (
    SELECT post_id, ord
    FROM unnest($4::uuid[]) WITH ORDINALITY AS input(post_id, ord)
)
SELECT
    c.ord,
    c.post_id,
    COALESCE(bool_or(g.post_id IS NOT NULL), false) AS allowed
FROM candidate_posts c
LEFT JOIN post_grants g
  ON g.tenant_id = $1
 AND g.subject_id = $2
 AND g.action = $3
 AND g.post_id = c.post_id
GROUP BY c.ord, c.post_id
ORDER BY c.ord;

Important details:

  • ord preserves the caller's input order and handles duplicate candidate IDs correctly.
  • Missing rows are denies.
  • bool_or tolerates joins that can produce multiple matching grant rows in richer schemas.
  • The resolver returns Vec<bool> or Vec<PolicyEvalResult> in the same order as the input items.

For inherited group/role relationships, the same shape works with a CTE that first expands the subject's effective principals, then joins grants once:

WITH candidate_posts AS (
    SELECT post_id, ord
    FROM unnest($4::uuid[]) WITH ORDINALITY AS input(post_id, ord)
), effective_subjects AS (
    SELECT $2::uuid AS subject_id
    UNION
    SELECT gm.group_id
    FROM group_memberships gm
    WHERE gm.member_id = $2
), matching_grants AS (
    SELECT DISTINCT g.post_id
    FROM post_grants g
    JOIN effective_subjects s ON s.subject_id = g.subject_id
    WHERE g.tenant_id = $1
      AND g.action = $3
)
SELECT
    c.ord,
    c.post_id,
    (m.post_id IS NOT NULL) AS allowed
FROM candidate_posts c
LEFT JOIN matching_grants m ON m.post_id = c.post_id
ORDER BY c.ord;

This is the real performance win: the policy/resolver gets one set-oriented backend query instead of N point queries, while callers still go through the central gatehouse checker.

Where built-in policies fit

  • AbacPolicy cannot generally optimize arbitrary Rust closures. It should use the default batch implementation unless a separate bulk-aware ABAC type is introduced.
  • RbacPolicy can at least avoid recomputing subject roles for every item, but true backend batching needs a bulk-capable resolver for required roles/resources.
  • RebacPolicy is the best first target. Add a default has_relationship_batch method to RelationshipResolver that loops over has_relationship, then let SQL/graph-backed resolvers override it with one query.
  • Combinators (AndPolicy, OrPolicy, NotPolicy) can also override evaluate_access_batch by applying their existing short-circuit rules across pending items.

Comparison with other auth systems

Other systems tend to expose three distinct patterns:

  1. Batch check: evaluate many concrete authorization questions in one call. Casbin has BatchEnforce() for this shape. Amazon Verified Permissions has BatchIsAuthorized, which returns ordered allow/deny decisions for multiple requests and requires either the principal or resource to be identical across the batch. This is the closest match for gatehouse's first step.
  2. Lookup/list authorized resources: ask the auth system which objects a subject can access, then use those IDs to filter application data. SpiceDB exposes LookupResources; OpenFGA exposes ListObjects. This is powerful for relationship systems, but it assumes the auth backend owns enough relationship data to enumerate resources.
  3. Policy-to-query filtering: compile policy into a data-layer predicate, like OPA partial evaluation producing SQL WHERE clauses. This can be excellent for list endpoints, but only for a policy fragment that is known to be translatable. Arbitrary Rust closures in gatehouse are not safely translatable.

Implication for gatehouse: start with a batch-check API plus policy/resolver batch hooks. Do not promise generic SQL pushdown. If query generation is ever added, it should be a separate opt-in trait for policies that can explicitly produce a safe backend predicate.

Useful references:

Tracing

Batch evaluation should emit one outer span/event with fields such as:

  • item_count
  • granted_count
  • denied_count
  • policy_count
  • optionally per-policy batch counts, e.g. policy.pending_count, policy.granted_count, policy.denied_count

Each returned AccessEvaluation should still contain an EvalTrace for that item. Per-item security events can remain available at trace/debug level, but the normal audit story should have one logical batch decision rather than hundreds of top-level events for one list request.

Acceptance criteria

  • Empty checker denies every item, preserving input order.
  • Batch results match a naive loop over evaluate_access for representative policy stacks.
  • Returned (item, AccessEvaluation) pairs preserve input order, including duplicate resources.
  • OR short-circuiting remains per item: later policies only evaluate items not granted by earlier policies.
  • A test policy can prove evaluate_access_batch is called once per policy for the pending set rather than once per item.
  • A bulk ReBAC resolver can prove the N-query SQL shape is replaced by one set-oriented query.
  • Tracing emits batch-level summary counts while preserving per-item EvalTrace data.

Non-goals

  • Generic SQL/predicate pushdown for arbitrary policies.
  • A special "any policy grants all items" short-circuit mode.
  • Caching subject-only sub-decisions in this issue, except where a concrete batch override naturally does it internally.
  • Parallel execution as the first version. Preserve deterministic per-item policy order first; concurrency can be considered separately if there is a measured need.

Workaround until this lands

Callers that need bulk authorization should keep looping through evaluate_access rather than duplicating policy logic in SQL or another backing store. The trace volume and fan-out cost are the price of faithfully running every configured policy today; this issue is about reducing that cost without sacrificing faithfulness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions