fix: validate OCR file upload type and size to prevent server crashes (Fixes #741)#767
Conversation
- Add Pydantic field_validator for image_base64 field in TicketRequest - Validate MIME type: only PNG, JPEG, TIFF, PDF allowed - Validate decoded file size: max 10MB - Return descriptive error messages for invalid uploads - Prevents server crashes from processing extremely large or non-image files Fixes ritesh-1918#741
|
@zeroknowledge0x is attempting to deploy a commit to the ritesh Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Review limit reached
More reviews will be available in 21 minutes and 20 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. 📝 WalkthroughWalkthroughThe PR adds image base64 validation to ChangesImage Base64 Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/main.py (1)
106-106: ⚡ Quick winImprove data URI parsing robustness.
The MIME extraction assumes a well-formed data URI. Consider validating the prefix format more strictly to provide clearer error messages for malformed inputs.
♻️ More robust parsing
- mime = prefix.split(":")[1].split(";")[0] if ":" in prefix else "" - if mime and mime not in allowed_types: + # Validate data URI format: data:<mime>;base64 + if ":" not in prefix or not prefix.startswith("data:"): + mime = "" + else: + mime = prefix.split(":", 1)[1].split(";")[0] + + if mime and mime not in allowed_types: raise ValueError( f"Unsupported file type '{mime}'. Allowed: PNG, JPEG, TIFF, PDF" ) + elif ":" in prefix and not mime: + raise ValueError("Invalid data URI format: could not extract MIME type")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` at line 106, The current one-liner extracting mime from the data URI prefix (mime = prefix.split(":")[1].split(";")[0] if ":" in prefix else "") is fragile; update parsing in the function that handles data URIs to validate the prefix format before splitting (e.g., ensure it starts with "data:" and contains both ":" and ";" in the expected places or use a small regex like ^data:([^;]+);) and if invalid raise a clear ValueError/BadRequest including the offending prefix; replace the fragile split logic with this validated extraction so the mime variable is only set when the prefix matches the expected pattern and provide a helpful error message when it does not.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/main.py`:
- Around line 101-110: The current check only inspects a data URI prefix (the
"," branch) so raw base64 in v bypasses MIME checks; fix by validating inputs
without a data: prefix: when "," not in v, decode the base64 into bytes and
perform magic-byte/type checks against allowed types (e.g., PNG signature 89 50
4E 47, JPEG FF D8 FF, TIFF 49 49/4D 4D, PDF "%PDF") and if the signature doesn't
match any allowed type raise the same ValueError (or a 422-equivalent) so
non-image bytes are rejected; keep the existing prefix-based MIME check for data
URIs and reuse allowed_types/mime variables for consistent error messages.
- Around line 122-126: In the except block handling Exception as e, preserve the
original exception chain by raising the new ValueError from the caught
exception: replace the final "raise ValueError(\"Invalid base64 image data\")"
with "raise ValueError(\"Invalid base64 image data\") from e"; keep the existing
bare "raise" inside the if branch that checks for "exceeds" or "Unsupported" so
the original exception is propagated unchanged (refer to the exception variable
e and the ValueError being raised).
---
Nitpick comments:
In `@backend/main.py`:
- Line 106: The current one-liner extracting mime from the data URI prefix (mime
= prefix.split(":")[1].split(";")[0] if ":" in prefix else "") is fragile;
update parsing in the function that handles data URIs to validate the prefix
format before splitting (e.g., ensure it starts with "data:" and contains both
":" and ";" in the expected places or use a small regex like ^data:([^;]+);) and
if invalid raise a clear ValueError/BadRequest including the offending prefix;
replace the fragile split logic with this validated extraction so the mime
variable is only set when the prefix matches the expected pattern and provide a
helpful error message when it does not.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
…chain - Validate file magic bytes (PNG, JPEG, TIFF, PDF) when base64 input lacks data: URI prefix, closing the MIME validation bypass gap - Use 'raise ... from e' to preserve exception context for debugging Addresses CodeRabbit review on PR ritesh-1918#767
|
Hi @zeroknowledge0x! Thanks for the contribution. I have triaged your PR and set it to merge into the
Welcome to the HELPDESK.AI developer family! 🚀💻 |
Summary
Adds server-side validation for the OCR file upload endpoint to prevent crashes from processing extremely large files or non-image file types.
Changes
field_validatorforimage_base64field inTicketRequestmodelValidation Details
data:image/png;base64,...) to extract and validate MIME typeTesting
Related Issues
Fixes #741
Summary by CodeRabbit