Skip to content

replace custom Azure Redis auth -> redis-entraid#308

Open
rootflo-hardik wants to merge 9 commits into
developfrom
fix/redis_connection_for_azure_entra_auth
Open

replace custom Azure Redis auth -> redis-entraid#308
rootflo-hardik wants to merge 9 commits into
developfrom
fix/redis_connection_for_azure_entra_auth

Conversation

@rootflo-hardik

@rootflo-hardik rootflo-hardik commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The previous AzureManagedRedisProvider fetched Entra ID tokens reactively (only on new connections), meaning long-lived connection pools would silently use expired tokens. It also had no path for Celery workers since Kombu doesn't expose redis-py's CredentialProvider hook via broker URL.

Replaced with redis-entraid which proactively refreshes tokens in a background thread. A single patch_redis_for_azure() utility monkey-patches redis.ConnectionPool.init globally at process startup, injecting the provider before any connection is made — covering CacheManager, Kombu's broker pool, and the Celery task producer in the FastAPI app with one call each.

Summary by CodeRabbit

  • Bug Fixes

    • Improved Redis connectivity for Azure-hosted deployments by applying Azure-compatible Redis credential handling during application startup and Celery worker startup.
    • Simplified Redis connection pool authentication to rely on standard credentials, removing Azure-specific authentication wiring.
  • Chores / Configuration

    • Updated Celery worker startup to disable mingle and gossip for more predictable behavior.
    • Set Celery’s default task queue to '{celery}' and disabled Celery remote control.
    • Added the redis-entraid dependency.

The previous AzureManagedRedisProvider fetched Entra ID tokens reactively (only on new connections), meaning long-lived connection pools would silently use expired tokens. It also had no path for Celery workers since Kombu doesn't expose redis-py's CredentialProvider hook via broker URL.

Replaced with redis-entraid which proactively refreshes tokens in a background thread. A single patch_redis_for_azure() utility monkey-patches redis.ConnectionPool.__init__ globally at process startup, injecting the provider before any connection is made — covering CacheManager, Kombu's broker pool, and the Celery task producer in the FastAPI app with one call each.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a Redis/Azure monkey-patch module, wires it into server and worker startup, removes the previous Azure Redis credential path from cache_manager.py, and updates the worker image command and Redis dependency.

Changes

Azure Redis Authentication Migration

Layer / File(s) Summary
New Azure Redis auth patch module
wavefront/server/modules/db_repo_module/db_repo_module/cache/azure_redis_auth.py, wavefront/server/modules/db_repo_module/pyproject.toml
Adds patch_redis_for_azure(), which monkey-patches redis.ConnectionPool.__init__ with Azure credential handling and timeout protection; adds redis-entraid.
Wire patch into startup
wavefront/server/apps/floware/floware/server.py, wavefront/server/background_jobs/celery_worker/celery_worker/celery_app.py, wavefront/server/modules/agents_module/agents_module/utils/celery_client.py, wavefront/server/docker/celery_worker.Dockerfile
Calls the patch during FastAPI lifespan startup, registers it on Celery worker process init, updates Celery task queue configuration, and changes the worker container command flags.
Remove old Azure auth logic
wavefront/server/modules/db_repo_module/db_repo_module/cache/cache_manager.py
Removes the Azure-managed Redis provider and the Azure-specific pool branch, leaving password-only pool setup.

Estimated code review effort: 4 (Complex) | ~45 minutes

Possibly related PRs

  • rootflo/wavefront#272: Replaces the earlier Azure-managed Redis authentication path in cache_manager.py, which this PR removes.

Suggested reviewers: vishnurk6247, vizsatiz

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: replacing custom Azure Redis auth with redis-entraid.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/redis_connection_for_azure_entra_auth

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

original_init(self, *args, **kw)

redis.ConnectionPool.__init__ = patched_init
_patched = True

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@wavefront/server/background_jobs/celery_worker/celery_worker/celery_app.py`:
- Around line 7-11: Call patch_redis_for_azure() at module import in
celery_app.py so the Redis monkey-patch is applied before the main worker
process opens broker/backend connections, and keep setup_azure_redis_auth
connected to worker_process_init for pool children. Use the existing
setup_azure_redis_auth hook and patch_redis_for_azure symbol to add the
import-time call without removing the signal-based initialization.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25371270-5b01-47e3-8877-70d98e8e327d

📥 Commits

Reviewing files that changed from the base of the PR and between 21e191d and 9642377.

⛔ Files ignored due to path filters (1)
  • wavefront/server/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • wavefront/server/apps/floware/floware/server.py
  • wavefront/server/background_jobs/celery_worker/celery_worker/celery_app.py
  • wavefront/server/modules/db_repo_module/db_repo_module/cache/azure_redis_auth.py
  • wavefront/server/modules/db_repo_module/db_repo_module/cache/cache_manager.py
  • wavefront/server/modules/db_repo_module/pyproject.toml

Comment on lines +7 to +11
@worker_process_init.connect
def setup_azure_redis_auth(**kwargs):
from db_repo_module.cache.azure_redis_auth import patch_redis_for_azure

patch_redis_for_azure()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Celery 5.4 worker_process_init signal documentation child process main worker broker connection timing

💡 Result:

In Celery 5.4, the worker_process_init signal is dispatched within each individual pool child process immediately after it is initialized by the prefork pool, but before it begins processing tasks [1][2]. Key technical details regarding your query: 1. Signal Timing: The worker_process_init signal is triggered by the process_initializer function [2]. This function is responsible for setting up the environment for the child process (e.g., resetting signals, initializing logging, and configuring task execution) [2]. Because it is called as part of the process initialization sequence, it runs in the child process context, not the main worker process context [1][2]. 2. Broker Connection Timing: The main Celery worker process is responsible for managing the connection to the message broker [3]. Broker connections are established by the Consumer bootstep in the main process [3][4]. Child processes in the prefork pool do not independently establish their own primary connections to the broker to fetch tasks; instead, they receive tasks from the main process via an internal pipe/queue mechanism [2]. Therefore, worker_process_init completes its execution before the child processes are ready to receive and execute tasks, and it operates independently of the main process's broker connection lifecycle [2]. 3. Development Constraint: Handlers attached to the worker_process_init signal must not block for more than 4 seconds [1][5]. If a handler takes longer, the worker assumes the child process failed to start and will kill it [1][5]. In summary, worker_process_init is a child-process-specific hook that runs during the boot phase of pool workers, long before they start consuming tasks, and it is entirely separate from the main process's broker connection logic [1][2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== celery_app.py ==\n'
sed -n '1,120p' wavefront/server/background_jobs/celery_worker/celery_worker/celery_app.py

printf '\n== azure_redis_auth references ==\n'
rg -n "patch_redis_for_azure|azure_redis_auth|worker_process_init|Celery\\(" wavefront/server/background_jobs -S

Repository: rootflo/wavefront

Length of output: 1875


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path('wavefront/server/background_jobs')
for path in p.rglob('azure_redis_auth.py'):
    print(path)
PY

rg -n "def patch_redis_for_azure|redis" wavefront/server/background_jobs -S

Repository: rootflo/wavefront

Length of output: 1155


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== worker_setup.py ==\n'
sed -n '1,160p' wavefront/server/background_jobs/celery_worker/celery_worker/worker_setup.py

printf '\n== env.py ==\n'
sed -n '1,160p' wavefront/server/background_jobs/celery_worker/celery_worker/env.py

printf '\n== app import usages ==\n'
rg -n "from celery_worker\.celery_app import app|import celery_worker\.celery_app|celery_app import app" wavefront/server/background_jobs/celery_worker -S

Repository: rootflo/wavefront

Length of output: 9143


Patch Azure Redis auth at module import. worker_process_init only runs in pool children, so the main worker process can open the Redis broker/backend connection before this monkey-patch applies. Call patch_redis_for_azure() at import time and keep the signal handler for child processes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wavefront/server/background_jobs/celery_worker/celery_worker/celery_app.py`
around lines 7 - 11, Call patch_redis_for_azure() at module import in
celery_app.py so the Redis monkey-patch is applied before the main worker
process opens broker/backend connections, and keep setup_azure_redis_auth
connected to worker_process_init for pool children. Use the existing
setup_azure_redis_auth hook and patch_redis_for_azure symbol to add the
import-time call without removing the signal-based initialization.

Azure Cache for Redis in cluster mode rejects multi-key pipelines that span
hash slots. Celery's mingle and gossip startup steps issue cross-slot MULTI/EXEC
transactions (e.g. reply.celery.pidbox vs reply.celery.pidbox3), causing workers
to crash on boot. --without-mingle --without-gossip skips these handshakes;
task execution is unaffected.
Wraps redis-entraid's get_credentials() in a daemon thread with a 10-second
timeout. Without this, the proactive background token fetch could block
indefinitely on startup if Azure workload identity is slow to respond,
causing the server to hang silently for the full socket_timeout (60s).
Azure Managed Redis rejects MULTI pipelines across keys in different
hash slots. Celery's priority-queue suffixes (celery, celery3, etc.)
land in different slots. Setting task_default_queue='{celery}' forces
all derived keys to hash on 'celery', placing them in the same slot.
Azure Managed Redis rejects pipelines across different hash slots.
Celery's pidbox priority variants hash to different slots, causing
CROSSSLOT in MULTI. Disabling remote control removes pidbox entirely.
task_acks_late=True causes Kombu to pipeline unacked, unacked_index,
and the task queue in a single MULTI block. On Azure Managed Redis,
these keys hash to different slots causing CROSSSLOT. Setting them
with the {celery} hash tag via broker_transport_options forces all
three to the same slot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant