fix: make corrections log writes atomic, race-safe, and PII-aware#524
Conversation
- Replace raw user_id with SHA256 hash (8-char prefix) in all log statements - Maintains audit trail capability while protecting user identifiers (PII) - Complies with GDPR/CCPA privacy requirements - Hash is deterministic for correlation without exposing PII Resolves CodeRabbit PII logging concern
…backfill Fix tenant ticket orphaning by persisting company_id on save
…ashboard feat: Real-time Support Dashboard Updates Using Supabase Realtime Channels
…y bugs on Ticket Detail page
…bles across UI and config
… with premium header
…link them in landing footer
…and style premium elements
|
@ionfwsrijan is attempting to deploy a commit to the ritesh Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThe PR hardens ChangesAtomic correction logging with PII redaction
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@ritesh-1918 You may review and merge this |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/main.py (1)
492-496: ⚡ Quick winAdd
flush/fsyncfor true crash durability.
os.replacemakes the rename atomic, but without flushing the temp file's contents to disk first, a crash or power loss after the replace can leave an empty or torn file in the page cache — which weakens the "durable across restarts" acceptance criterion. Flush +fsyncthe file (and ideally the parent directory) before replacing.♻️ Durable atomic write
def _atomic_write_json(path: Path, data) -> None: tmp_path = path.with_suffix(".tmp") with open(tmp_path, "w", encoding="utf-8") as f: json.dump(data, f, indent=2) + f.flush() + os.fsync(f.fileno()) os.replace(tmp_path, path) + # Persist the directory entry so the rename survives a crash. + dir_fd = os.open(path.parent, os.O_RDONLY) + try: + os.fsync(dir_fd) + finally: + os.close(dir_fd)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 492 - 496, The _atomic_write_json function currently writes to a temp file and calls os.replace but does not flush or fsync, so add an explicit flush() and os.fsync(f.fileno()) after json.dump (while the temp file is still open) and then, after closing the temp file, open the parent directory and call os.fsync(dir_fd) (or os.fsync on an open directory descriptor) before calling os.replace to ensure both the file contents and directory metadata are persisted for crash durability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/main.py`:
- Line 482: The current _IP_RE only matches IPv4 so IPv6 addresses in
original_text/ocr_text remain unredacted; add a compact IPv6 regex (or a
separate _IPV6_RE) and ensure the redaction logic applies both patterns (e.g.,
use _IP_RE and _IPV6_RE when scanning/replacing in functions that redact
original_text/ocr_text) so both IPv4 and IPv6 are covered; update any code that
references only _IP_RE to run the IPv6 pattern as well.
- Around line 538-546: log_correction currently uses an in-process
_corrections_lock but still does read→append→rewrite on CORRECTIONS_LOG_PATH
which is unsafe across processes/replicas; replace the in-process lock usage
with an inter-process safe approach (either: acquire a file-based advisory lock
for CORRECTIONS_LOG_PATH around the read/append/_atomic_write_json sequence
using a library like portalocker or fcntl, referencing the same
CORRECTIONS_LOG_PATH and keeping _atomic_write_json for safe replacement, OR
migrate persistence out of the local JSON file into a transactionally-safe
backend and update log_correction to write to that backend); also add a short
comment/docstring in log_correction describing the new cross-process safety
guarantee or the single-process assumption if you choose to keep the JSON file.
---
Nitpick comments:
In `@backend/main.py`:
- Around line 492-496: The _atomic_write_json function currently writes to a
temp file and calls os.replace but does not flush or fsync, so add an explicit
flush() and os.fsync(f.fileno()) after json.dump (while the temp file is still
open) and then, after closing the temp file, open the parent directory and call
os.fsync(dir_fd) (or os.fsync on an open directory descriptor) before calling
os.replace to ensure both the file contents and directory metadata are persisted
for crash durability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 29368972 | Triggered | Supabase Service Role JWT | b460068 | scratch/test_companies.js | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
|
Hi @ionfwsrijan! Thanks for the contribution. I have successfully converted your PR's target branch to PR approved and merged! Welcome to the family! 🚀💻 🌟 Developer Action NetworkBefore starting or submitting updates, please complete these quick onboarding steps:
Note: All PR branches must target the |
Description
Fixes #368 — makes corrections log writes atomic, race-safe, and PII-aware.
Changes
backend/main.py(/ai/log_correction):.tmpfile thenos.replace()— no partial/corrupt JSON on crashasyncio.Lockserializes concurrent requests so reads/writes don't interleaveoriginal_text/ocr_textbefore persistinglogging.info/errorinstead ofprint()for consistencyAcceptance criteria met
asyncio.Lock)Summary by CodeRabbit
Bug Fixes
Chores