Description
The /scan endpoint accepts any file upload without validating the MIME type or extension. If a user accidentally uploads a .pdf, .docx, or a corrupted file, the server tries to extract it as a ZIP, fails with a Python exception deep in the extraction logic, and returns a raw 500 with a stack trace which leaks internal path information.
Steps to reproduce
Upload any non-ZIP file (e.g. a .pdf) via the frontend scan input
Observe the response
Expected behaviour
A 400 response: "Invalid file type. Please upload a ZIP archive of your codebase."
What to implement
- At the top of the /scan handler, check file.content_type and the filename extension before attempting extraction
- Accepted: application/zip, application/x-zip-compressed, .zip extension
- Return 400 with a clear message for anything else
- Catch zipfile.BadZipFile as a fallback and return 400 instead of 500
Acceptance criteria
- Non-ZIP upload returns 400 with a human-readable message
- Corrupted ZIP returns 400 with "File appears to be corrupted or is not a valid ZIP archive"
- No stack trace or internal path is ever exposed in the response
- Frontend shows the error message to the user
Description
The /scan endpoint accepts any file upload without validating the MIME type or extension. If a user accidentally uploads a .pdf, .docx, or a corrupted file, the server tries to extract it as a ZIP, fails with a Python exception deep in the extraction logic, and returns a raw 500 with a stack trace which leaks internal path information.
Steps to reproduce
Upload any non-ZIP file (e.g. a .pdf) via the frontend scan input
Observe the response
Expected behaviour
A 400 response: "Invalid file type. Please upload a ZIP archive of your codebase."
What to implement
Acceptance criteria