Skip to content

fix: adds guardrails against CPU/memory DoS#523

Merged
ritesh-1918 merged 19 commits into
ritesh-1918:gssocfrom
ionfwsrijan:fix/issue-367-ocr-dos
May 29, 2026
Merged

fix: adds guardrails against CPU/memory DoS#523
ritesh-1918 merged 19 commits into
ritesh-1918:gssocfrom
ionfwsrijan:fix/issue-367-ocr-dos

Conversation

@ionfwsrijan
Copy link
Copy Markdown

@ionfwsrijan ionfwsrijan commented May 28, 2026

Description

Fixes #367 — adds guardrails against CPU/memory DoS via unbounded base64 image OCR.

Changes

  • backend/services/ocr_service.py:

    • Rejects base64 payloads exceeding 10 MB before decode
    • Rejects decoded byte blobs exceeding 8 MB
    • Validates image with Pillow (Image.verify() + dimension check, max 4096px)
    • Limits concurrent OCR to 2 via asyncio.Semaphore
    • Wraps OCR in 60s timeout via asyncio.wait_for + executor
    • extract_text() made async — returns early on validation failure
  • backend/main.py:

    • Adds Pydantic @field_validator on TicketRequest.image_base64 for early rejection at the validation layer
    • Updates the analyze_ticket call site to await the async extract_text()

Acceptance criteria met

  • Oversized OCR requests are rejected quickly (Pydantic + service-level)
  • OCR cannot exhaust CPU/RAM under adversarial inputs (size/dimension limits, concurrency control)
  • Service remains responsive under repeated OCR calls (semaphore prevents resource starvation)

Summary by CodeRabbit

  • Bug Fixes

    • Added validation to reject image uploads larger than 10 MB, preventing processing of oversized payloads.
    • Implemented OCR timeout protection to prevent indefinite processing hangs.
    • Improved error handling with better logging for OCR failures.
  • Improvements

    • Enhanced concurrency limits and resource constraints for more stable OCR operations.

Review Change Stack

namann5 and others added 19 commits May 22, 2026 11:30
- Replace raw user_id with SHA256 hash (8-char prefix) in all log statements
- Maintains audit trail capability while protecting user identifiers (PII)
- Complies with GDPR/CCPA privacy requirements
- Hash is deterministic for correlation without exposing PII

Resolves CodeRabbit PII logging concern
…backfill

Fix tenant ticket orphaning by persisting company_id on save
…ashboard

feat: Real-time Support Dashboard Updates Using Supabase Realtime Channels
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

@ionfwsrijan is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

The changes implement multi-layered DoS protection for image-based OCR requests. Request validation rejects oversized base64 payloads at the API boundary; OCR service constants define resource limits and concurrency controls; extract_text adds granular input validation, image integrity checks, and async-safe execution with timeouts and semaphore gating; the endpoint is updated to properly await async OCR extraction.

Changes

Image OCR DoS Protection

Layer / File(s) Summary
Request image validation
backend/main.py
Pydantic field_validator on TicketRequest.image_base64 rejects base64 payloads larger than 10 MB, providing early rejection before reaching the OCR service.
OCR service resource infrastructure
backend/services/ocr_service.py
Module-level constants MAX_BASE64_LENGTH, MAX_DECODED_BYTES, MAX_IMAGE_DIMENSION, MAX_CONCURRENT_OCR, and OCR_TIMEOUT establish resource boundaries. OCRService.__init__ creates an asyncio.Semaphore for concurrency control, and _run_ocr helper runs the lazy-loaded EasyOCR reader synchronously.
Validated async extraction with concurrency and timeout
backend/services/ocr_service.py
extract_text validates base64 size and normalizes input (strips data-URI prefixes, repairs padding), decodes and validates image dimensions, uses PIL to verify image integrity, invokes _run_ocr via asyncio.run_in_executor with semaphore gating and asyncio.wait_for timeout, and returns concatenated text on success or empty string on failure.
Async endpoint integration
backend/main.py
The /ai/analyze_ticket endpoint now awaits ocr_service.extract_text(...) to align with the async executor-based execution model.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 A rabbit once feared DoS attacks with glee,
So it built up defenses with semaphores three,
Base64 limits and image size checks—
Now timeouts protect from malicious wrecks! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: adds guardrails against CPU/memory DoS' accurately and concisely summarizes the main change: adding protections against denial-of-service attacks via oversized image inputs.
Linked Issues check ✅ Passed The code changes address all primary coding requirements from Issue #367: enforcing base64/decoded size limits, validating images with Pillow, limiting concurrent OCR with semaphores, and adding timeouts via asyncio.wait_for.
Out of Scope Changes check ✅ Passed All changes in backend/main.py and backend/services/ocr_service.py are directly related to implementing DoS protection as required by Issue #367; no out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ionfwsrijan
Copy link
Copy Markdown
Author

@ritesh-1918 You may review and merge this

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/services/ocr_service.py`:
- Line 83: Replace the deprecated asyncio.get_event_loop() call with
asyncio.get_running_loop() where the variable loop is obtained (replace the
asyncio.get_event_loop() expression in the OCR service code) so the currently
running loop is retrieved in async contexts; update any surrounding async
function (the code that assigns loop) to ensure it runs inside an active event
loop and use asyncio.get_running_loop() instead of asyncio.get_event_loop().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05b905b0-63d5-4a30-9069-43e7ec82200d

📥 Commits

Reviewing files that changed from the base of the PR and between fb6a950 and b6e09a5.

📒 Files selected for processing (2)
  • backend/main.py
  • backend/services/ocr_service.py

Comment thread backend/services/ocr_service.py
@ritesh-1918 ritesh-1918 changed the base branch from main to gssoc May 29, 2026 19:26
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 29, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29368972 Triggered Supabase Service Role JWT b460068 scratch/test_companies.js View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@ritesh-1918 ritesh-1918 added gssoc GirlScript Summer of Code gssoc:approved GSSoC Approved PR level:intermediate Intermediate level difficulty quality:exceptional Exceptional code quality type:bug Bug fix labels May 29, 2026
@ritesh-1918
Copy link
Copy Markdown
Owner

Hi @ionfwsrijan! Thanks for the contribution. I have successfully converted your PR's target branch to gssoc to keep our codebase unified.

PR approved and merged! Welcome to the family! 🚀💻


🌟 Developer Action Network

Before starting or submitting updates, please complete these quick onboarding steps:

  1. Star this repository: https://github.com/ritesh-1918/HELPDESK.AI
  2. 👤 Follow the Project Admin: https://github.com/ritesh-1918
  3. 💼 Connect on LinkedIn: https://www.linkedin.com/in/ritesh1908/

Note: All PR branches must target the gssoc branch, NOT main.

@ritesh-1918 ritesh-1918 added level:critical Critical level difficulty type:feature New feature labels May 29, 2026
@ritesh-1918 ritesh-1918 merged commit 34e88ff into ritesh-1918:gssoc May 29, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved GSSoC Approved PR gssoc GirlScript Summer of Code level:critical Critical level difficulty level:intermediate Intermediate level difficulty quality:exceptional Exceptional code quality type:bug Bug fix type:feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unbounded base64 image OCR enables CPU/memory DoS

4 participants