Skip to content

fix: make corrections log writes atomic, race-safe, and PII-aware#524

Merged
ritesh-1918 merged 19 commits into
ritesh-1918:gssocfrom
ionfwsrijan:fix/issue-368-atomic-corrections
May 29, 2026
Merged

fix: make corrections log writes atomic, race-safe, and PII-aware#524
ritesh-1918 merged 19 commits into
ritesh-1918:gssocfrom
ionfwsrijan:fix/issue-368-atomic-corrections

Conversation

@ionfwsrijan
Copy link
Copy Markdown

@ionfwsrijan ionfwsrijan commented May 28, 2026

Description

Fixes #368 — makes corrections log writes atomic, race-safe, and PII-aware.

Changes

  • backend/main.py (/ai/log_correction):
    • Atomic writes: writes to a .tmp file then os.replace() — no partial/corrupt JSON on crash
    • Async lock: asyncio.Lock serializes concurrent requests so reads/writes don't interleave
    • PII redaction: strips emails, phone numbers, and IPs from original_text/ocr_text before persisting
    • Structured logging: uses logging.info/error instead of print() for consistency

Acceptance criteria met

  • Concurrent correction submissions do not corrupt the log (atomic write + lock)
  • No lost updates under parallel requests (serialized via asyncio.Lock)
  • Storage is durable across restarts (atomic write guarantees consistent file state)

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced error handling and response consistency for correction log submissions
    • Improved concurrency support for correction log operations with better data handling stability
  • Chores

    • Updated correction endpoint to return structured status responses

Review Change Stack

namann5 and others added 19 commits May 22, 2026 11:30
- Replace raw user_id with SHA256 hash (8-char prefix) in all log statements
- Maintains audit trail capability while protecting user identifiers (PII)
- Complies with GDPR/CCPA privacy requirements
- Hash is deterministic for correlation without exposing PII

Resolves CodeRabbit PII logging concern
…backfill

Fix tenant ticket orphaning by persisting company_id on save
…ashboard

feat: Real-time Support Dashboard Updates Using Supabase Realtime Channels
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

@ionfwsrijan is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

The PR hardens /ai/log_correction against concurrent write race conditions by adding an async lock, atomic file utilities, and PII redaction. Correction logs are now serialized through _corrections_lock, read and appended atomically via _atomic_write_json, and sensitive email/phone/IP data is redacted before storage.

Changes

Atomic correction logging with PII redaction

Layer / File(s) Summary
Imports, PII helpers, and atomic write utilities
backend/main.py
Added re and tempfile imports. Defined _corrections_lock async lock, compiled regex patterns for email/phone/IP detection, and implemented _redact_pii and _atomic_write_json helpers. Updated request parsing to use structured logging and redact original_text/ocr_text before constructing the persisted entry.
Correction endpoint persistence with lock and atomicity
backend/main.py
Wrapped correction-log read/append/write in _corrections_lock acquisition and replaced direct file writes with _atomic_write_json (temp file + rename). Returns {"status":"saved", ...} on success or error payload on failure, with structured logging for both outcomes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A lock and temp file walk into a log,
"No race conditions here!" they blog.
PII redacted, writes atomic, strong—
Corrections flow safely, all day long!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: make corrections log writes atomic, race-safe, and PII-aware' directly and comprehensively summarizes the main changes: atomic writes, race condition prevention via async locks, and PII redaction.
Linked Issues check ✅ Passed The PR implements all major coding requirements from issue #368: atomic writes via temp file and os.replace(), async lock for serialization, PII redaction for emails/phones/IPs, and structured logging replacing print() statements.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #368 in backend/main.py's /ai/log_correction endpoint; no unrelated modifications or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ionfwsrijan
Copy link
Copy Markdown
Author

@ritesh-1918 You may review and merge this

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/main.py (1)

492-496: ⚡ Quick win

Add flush/fsync for true crash durability.

os.replace makes the rename atomic, but without flushing the temp file's contents to disk first, a crash or power loss after the replace can leave an empty or torn file in the page cache — which weakens the "durable across restarts" acceptance criterion. Flush + fsync the file (and ideally the parent directory) before replacing.

♻️ Durable atomic write
 def _atomic_write_json(path: Path, data) -> None:
     tmp_path = path.with_suffix(".tmp")
     with open(tmp_path, "w", encoding="utf-8") as f:
         json.dump(data, f, indent=2)
+        f.flush()
+        os.fsync(f.fileno())
     os.replace(tmp_path, path)
+    # Persist the directory entry so the rename survives a crash.
+    dir_fd = os.open(path.parent, os.O_RDONLY)
+    try:
+        os.fsync(dir_fd)
+    finally:
+        os.close(dir_fd)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/main.py` around lines 492 - 496, The _atomic_write_json function
currently writes to a temp file and calls os.replace but does not flush or
fsync, so add an explicit flush() and os.fsync(f.fileno()) after json.dump
(while the temp file is still open) and then, after closing the temp file, open
the parent directory and call os.fsync(dir_fd) (or os.fsync on an open directory
descriptor) before calling os.replace to ensure both the file contents and
directory metadata are persisted for crash durability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/main.py`:
- Line 482: The current _IP_RE only matches IPv4 so IPv6 addresses in
original_text/ocr_text remain unredacted; add a compact IPv6 regex (or a
separate _IPV6_RE) and ensure the redaction logic applies both patterns (e.g.,
use _IP_RE and _IPV6_RE when scanning/replacing in functions that redact
original_text/ocr_text) so both IPv4 and IPv6 are covered; update any code that
references only _IP_RE to run the IPv6 pattern as well.
- Around line 538-546: log_correction currently uses an in-process
_corrections_lock but still does read→append→rewrite on CORRECTIONS_LOG_PATH
which is unsafe across processes/replicas; replace the in-process lock usage
with an inter-process safe approach (either: acquire a file-based advisory lock
for CORRECTIONS_LOG_PATH around the read/append/_atomic_write_json sequence
using a library like portalocker or fcntl, referencing the same
CORRECTIONS_LOG_PATH and keeping _atomic_write_json for safe replacement, OR
migrate persistence out of the local JSON file into a transactionally-safe
backend and update log_correction to write to that backend); also add a short
comment/docstring in log_correction describing the new cross-process safety
guarantee or the single-process assumption if you choose to keep the JSON file.

---

Nitpick comments:
In `@backend/main.py`:
- Around line 492-496: The _atomic_write_json function currently writes to a
temp file and calls os.replace but does not flush or fsync, so add an explicit
flush() and os.fsync(f.fileno()) after json.dump (while the temp file is still
open) and then, after closing the temp file, open the parent directory and call
os.fsync(dir_fd) (or os.fsync on an open directory descriptor) before calling
os.replace to ensure both the file contents and directory metadata are persisted
for crash durability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93cae4df-8621-44b3-9415-aec8d3920686

📥 Commits

Reviewing files that changed from the base of the PR and between fb6a950 and 74a470d.

📒 Files selected for processing (1)
  • backend/main.py

Comment thread backend/main.py
Comment thread backend/main.py
@ritesh-1918 ritesh-1918 changed the base branch from main to gssoc May 29, 2026 19:26
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 29, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29368972 Triggered Supabase Service Role JWT b460068 scratch/test_companies.js View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@ritesh-1918 ritesh-1918 added gssoc GirlScript Summer of Code gssoc:approved GSSoC Approved PR level:intermediate Intermediate level difficulty quality:exceptional Exceptional code quality type:bug Bug fix labels May 29, 2026
@ritesh-1918
Copy link
Copy Markdown
Owner

Hi @ionfwsrijan! Thanks for the contribution. I have successfully converted your PR's target branch to gssoc to keep our codebase unified.

PR approved and merged! Welcome to the family! 🚀💻


🌟 Developer Action Network

Before starting or submitting updates, please complete these quick onboarding steps:

  1. Star this repository: https://github.com/ritesh-1918/HELPDESK.AI
  2. 👤 Follow the Project Admin: https://github.com/ritesh-1918
  3. 💼 Connect on LinkedIn: https://www.linkedin.com/in/ritesh1908/

Note: All PR branches must target the gssoc branch, NOT main.

@ritesh-1918 ritesh-1918 added level:critical Critical level difficulty type:feature New feature labels May 29, 2026
@ritesh-1918 ritesh-1918 merged commit a83b5d4 into ritesh-1918:gssoc May 29, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved GSSoC Approved PR gssoc GirlScript Summer of Code level:critical Critical level difficulty level:intermediate Intermediate level difficulty quality:exceptional Exceptional code quality type:bug Bug fix type:feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corrections log writes are non-atomic (race leads to corrupted training/audit data)

4 participants