Question: Pattern for cross-job fan-in coordination with machine-specific workers? #637

its-grape-juice · 2025-11-23T09:21:09Z

its-grape-juice
Nov 23, 2025

Hi! I'm building a distributed job processing pipeline with Apalis and have a question about the best pattern for cross-job coordination.

Use Case

I have a pipeline where:

JobTypeA is processed
Worker pushes 4 separate JobTypeB instances (different variants/parameters)
These 4 jobs are processed in parallel by different workers (potentially on different machines)
Need to collect all 4 results before proceeding to next stage (JobTypeC)

Requirements

Different job types for machine targeting (some jobs need GPU, others CPU only)
Multi-machine deployment (workers on different servers)
Fan-in coordination (collect N parallel results into 1)

What I've Tried

Option 1: apalis-workflow with `.filter_map()`

I initially thought workflows could handle this:

Workflow::new("pipeline")
    .and_then(process_step_1)
    .and_then(|result| Ok(vec![variant1, variant2, variant3, variant4]))
    .filter_map(|item| async { process(item) })  // Distributed?
    .and_then(|collected: Vec<_>| async { ... })

Question: Does .filter_map() distribute items across workers on different machines?

Looking at the implementation, it uses backend.wait_for(task_ids) which seems perfect, but I'm not sure if this works when I want:

Separate worker types (PostgresStorage<JobTypeB> on specific machines only)
Items pushed as separate top-level jobs (not sub-tasks of a workflow)

Option 2: Manual coordination with PostgreSQL table

Currently using:

// Worker Type 1: Process JobTypeA
async fn process_a(job: JobTypeA, storage_b: Data<PostgresStorage<JobTypeB>>) {
    let result = process(job).await?;

    // Push N separate jobs
    for variant in variants {
        storage_b.push(JobTypeB { result, variant }).await?;
    }
}

// Worker Type 2: Process JobTypeB (different worker type, different machines!)
async fn process_b(job: JobTypeB, pool: Data<PgPool>, storage_c: Data<PostgresStorage<JobTypeC>>) {
    let result = process(job).await?;

    // Manual coordination via PostgreSQL table
    sqlx::query("INSERT INTO coordination_table ...").execute(pool).await?;
    let count: i64 = sqlx::query_scalar("SELECT COUNT(*) WHERE batch_id = $1").fetch_one(pool).await?;

    if count == N {
        // Last worker triggers next stage
        storage_c.push(JobTypeC { ... }).await?;
    }
}

This works but feels like I'm reimplementing what workflows already do.

Questions

Can workflow .filter_map() work across separate worker types?
- Can Worker Type A's workflow use .filter_map() to create items that are processed by Worker Type B (different PostgresStorage<T>)?
Is there a built-in pattern for cross-job coordination?
- The workflow collector uses backend.wait_for(task_ids) - can I use these primitives directly for cross-job coordination?
What's the recommended pattern for fan-in when:
- Jobs are separate instances (not items within a single workflow execution)
- Workers are machine-specific (different worker types for different hardware)
- Multiple processes/machines involved

Context

Using apalis-postgres backend
Multi-machine deployment (5-10 servers)
Different worker types need different hardware (GPU vs CPU)
Each worker type has different heavy dependencies

Thanks for any guidance! Apalis has been excellent so far, just want to make sure I'm using the right pattern for this scenario.

Apalis version: 1.0.0-beta.1

Answered by geofmureithi

Nov 24, 2025

Currently the feature to choose which worker consumes a task is not implemented and will be implemented as a Pro feature.
Option 2 is possibly the way to go for you, then use wait_for.

View full answer

geofmureithi · 2025-11-23T22:15:21Z

geofmureithi
Nov 23, 2025
Maintainer

As indicated in the docs of apalis-workflow, all steps are distributed.

Steps are processed in a distributed manner.

filter_map should solve your problems even in a distributed manner.

You dont need to use wait_for unless you are doing something filtermap doesnt support

0 replies

its-grape-juice · 2025-11-24T08:02:57Z

its-grape-juice
Nov 24, 2025
Author

Thanks for the quick response!
I understand .filter_map() distributes items within a workflow.
My question is about coordination across different worker types.

Scenario:

// Worker Type A (GPU machines)
let storage_a: PostgresStorage<JobTypeA> = PostgresStorage::new(&pool);

async fn handler_a(job: JobTypeA, storage_b: Data<PostgresStorage<JobTypeB>>) {
    let result = process(job).await?;

    // Push 4 separate JobTypeB instances
    for variant in [V1, V2, V3, V4] {
        storage_b.push(JobTypeB { result, variant }).await?;
    }
}

WorkerBuilder::new("worker-a")
    .backend(storage_a)  //  Polls JobTypeA
    .data(storage_b)
    .build(handler_a)

// Worker Type B (CPU machines, different code!)
let storage_b: PostgresStorage<JobTypeB> = PostgresStorage::new(&pool);

async fn handler_b(job: JobTypeB) {
    // Process ONE JobTypeB instance
    process(job).await?;

    // How to know when ALL 4 JobTypeB instances (from the same batch) are done?
}

WorkerBuilder::new("worker-b")
    .backend(storage_b)  //  Polls JobTypeB (different from Worker A!)
    .build(handler_b)

The 4 JobTypeB instances are picked up by different workers (possibly on different machines).
They're not items in a workflow's Vec, they're separate top-level jobs in PostgreSQL.

Question:
How do I coordinate "when all 4 are done, push JobTypeC"?

Can I use .filter_map() across different worker types? Or is manual coordination (PostgreSQL table with COUNT) the recommended pattern?

(Why separate worker types: Worker A needs GPUs, Worker B runs on CPU-only machines.)

_{btw I really appreciate your work on this library for the year+ i've been following it}

0 replies

geofmureithi · 2025-11-24T08:17:29Z

geofmureithi
Nov 24, 2025
Maintainer

You cannot pass this to a workflow:

let storage_a: PostgresStorage<JobTypeA> = PostgresStorage::new(&pool);

Workflows accept backend where T is the compact type:

let storage_a: PostgresStorage<Vec<u8>> = PostgresStorage::new(&pool);

This is because if you have to store in a generalized manner but convert to the right type at execution point.

You should be able to do this:

async fn handler_a(job: JobTypeA, )  -> Result<Vec<JobTypeB>, BoxDynError> {
    let result = process(job).await?;
    let mut next = vec![];
    // Push 4 separate JobTypeB instances
    for variant in [V1, V2, V3, V4] {
        next.push(JobTypeB { result, variant });
    }
    Ok(next)
}

async fn handler_b(job: JobTypeB) -> Result<Option<String>, BoxDynError> {
    // Process ONE JobTypeB instance
    process(job).await?;
    Ok(Some(job.variant))
}

async fn collect(res: Vec<String>) {
  /// get your results here
}

#[tokio::main]
async fn main() {
    let workflow = Workflow::new("odd-numbers-workflow")
        .and_then(handler_a)
        .filter_map(handle_b)
        .and_then(collect);
}

0 replies

geofmureithi · 2025-11-24T08:23:03Z

geofmureithi
Nov 24, 2025
Maintainer

PS, you dont need to do coordination manually, use workflows because they are checked at compile time and were made for this exact use case.

0 replies

geofmureithi · 2025-11-24T08:40:06Z

geofmureithi
Nov 24, 2025
Maintainer

Just realized you wanted to do some execution on cpu and some on gpu. Currently you might have to do some extra work. Eg you may need to do something like:

async fn handler_b(job: JobTypeB) -> Result<Option<String>, BoxDynError> {
     let result = tokio::task::spawn_blocking(|| run_in_cuda_kernel()).await.unwrap();
}

2 replies

its-grape-juice Nov 24, 2025
Author

Thanks for the clarification! I understand that workflows with filter_map handle the fan-out/fan-in automatically.

One remaining question about machine-specific execution:

My Use Case

I need to deploy on two types of machines:

GPU machines (2 workers) - For GPU-intensive work
CPU machines (20 workers) - For CPU-only work

The Question

If all workers run the same workflow:

  let workflow = Workflow::new("pipeline")
      .and_then(step_a)  // Needs GPU
      .filter_map(step_b)  // CPU is fine
      .and_then(step_c);  // Needs GPU again

  // All machines run this
  WorkerBuilder::new("worker")
      .backend(PostgresStorage::new_with_config(&pool, &Config::new("pipeline")))
      .build(workflow)

How do I ensure:

GPU machines handle step_a and step_c
CPU machines handle the filter_map items

Options I can think of:

1. Runtime detection in handlers

async fn step_a(job: Job) -> Result<Output, Error> {
    if has_gpu() {
        process_with_gpu(job).await
    } else {
        Err("No GPU, skipping")  // Worker skips this job
    }
}

Problem: Inefficient (workers pick up jobs they can't process)
2. Separate workflows per machine type

  // GPU machines
  let gpu_workflow = Workflow::new("gpu").and_then(gpu_step_a).and_then(gpu_step_c);

  // CPU machines
  let cpu_workflow = Workflow::new("cpu").filter_map(cpu_step_b);
    - Problem: Back to manual coordination between workflows?

geofmureithi Nov 24, 2025
Maintainer

Currently the feature to choose which worker consumes a task is not implemented and will be implemented as a Pro feature.
Option 2 is possibly the way to go for you, then use wait_for.

Answer selected by its-grape-juice

apalis

Question: Pattern for cross-job fan-in coordination with machine-specific workers? #637

Uh oh!

its-grape-juice Nov 23, 2025

Use Case

Requirements

What I've Tried

Option 1: apalis-workflow with .filter_map()

Option 2: Manual coordination with PostgreSQL table

Questions

Context

Replies: 5 comments · 2 replies

Uh oh!

geofmureithi Nov 23, 2025 Maintainer

Uh oh!

its-grape-juice Nov 24, 2025 Author

Uh oh!

Uh oh!

geofmureithi Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

geofmureithi Nov 24, 2025 Maintainer

Uh oh!

geofmureithi Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

its-grape-juice Nov 24, 2025 Author

Uh oh!

geofmureithi Nov 24, 2025 Maintainer

its-grape-juice
Nov 23, 2025

Option 1: apalis-workflow with `.filter_map()`

Replies: 5 comments 2 replies

geofmureithi
Nov 23, 2025
Maintainer

its-grape-juice
Nov 24, 2025
Author

geofmureithi
Nov 24, 2025
Maintainer

geofmureithi
Nov 24, 2025
Maintainer

geofmureithi
Nov 24, 2025
Maintainer

its-grape-juice Nov 24, 2025
Author

geofmureithi Nov 24, 2025
Maintainer