Conversation
- Using lifespan context manager to replace deprecated startup event - Import pydantic_settings to resolve wrong pydantic import - Add required dependencies to requirements.txt - Mount only app/ subdirectory, remove reinstall to correct volume overwrite + race condition
…event performance issues
…n CameraSettingsRead and ExifDataRead
- Introduces user authentication endpoints and models, project management endpoints and models, document and camera APIs with user and project associations. - Adds password hashing and token-based security utilities, updates Pydantic schemas for new fields - Includes a test script for API validation. - Includes API documentation and backend implementation details (located inside `digitization-toolkit-software/docs/developers`) for mantainability and clearer frontend integration. - Updates requirements and environment variables for new features.
…ndependent camera settings
… capture_image function
…or capture metadata
…unctions for capture metadata management
…ate image metadata handling
…capture functionality
…ad of RecordImage
There was a problem hiding this comment.
Pull request overview
This pull request implements a comprehensive refactoring of the backend data model, transitioning from a flat "document" structure to a hierarchical "record/record_image" architecture with collection support. The changes enable better organization of digitized archival materials by separating descriptive metadata (records) from physical image files (record images), and introducing collections for hierarchical archival organization.
Changes:
- Renamed and restructured the data model from
DocumentImageto separateRecordandRecordImageentities, with Records representing conceptual archival objects and RecordImages representing individual captured files - Introduced a
Collectionmodel for hierarchical organization with support for nested subcollections (e.g., fonds → series → box → folder) - Enhanced camera API to create or link captures to records, track manifest IDs (capture_id, pair_id), and return record/image IDs in responses
- Modified authentication to accept tokens via query parameters in addition to Authorization headers for browser image requests
Reviewed changes
Copilot reviewed 25 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/setup_camera_backends.sh | New setup script for Raspberry Pi camera dependencies |
| pixi.toml | New Pixi package manager configuration for the project |
| app/models/record.py | New model separating Record (metadata) from RecordImage (files) |
| app/models/collection.py | New hierarchical collection model for archival organization |
| app/models/camera.py | Updated to reference record_image_id instead of document_image_id |
| app/models/project.py | Updated relationships to use records and collections |
| app/models/document.py | Deleted - replaced by record.py |
| app/schemas/record.py | Refactored schemas for Record, RecordImage, ExifData, and CameraSettings |
| app/schemas/collection.py | New schemas for collection CRUD operations |
| app/schemas/camera.py | Updated to use record_image_id |
| app/schemas/user.py | Added UserLogin schema for login endpoint |
| app/api/records.py | New endpoint replacing documents.py with record-centric operations |
| app/api/collections.py | New endpoint for collection hierarchy management |
| app/api/cameras.py | Updated to create/link records and return record/image IDs |
| app/api/projects.py | Updated references from documents to records |
| app/api/auth.py | Added query parameter token support for image requests |
| app/api/system.py | New system monitoring endpoint for CPU temperature |
| app/api/documents.py | Deleted - replaced by records.py |
| app/main.py | Updated router imports and CORS origins |
| app/core/db.py | Updated model imports |
| app/core/thumbnail.py | Updated comments to reference records |
| alembic/env.py | Updated model imports for migrations |
| alembic/versions/c3d4e5f6a7b8_*.py | Migration to rename document_images to record_images |
| alembic/versions/48189f9482e3_*.py | Migration to add collections table |
| alembic/versions/19e2aefe5b17_*.py | Migration to separate Record from RecordImage |
| capture/service.py | Updated return values to include capture_id and pair_id |
| .gitignore | Added Pixi-related files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Accept token from Authorization header OR ?token= query param (needed for <img src>) | ||
| raw_token = credentials.credentials if credentials else token | ||
| if not raw_token: | ||
| raise HTTPException(status_code=401, detail="Not authenticated") | ||
| payload = verify_access_token(raw_token) |
There was a problem hiding this comment.
Accepting authentication tokens via query parameters poses a security risk. Query parameters are often logged in server logs, browser history, and referrer headers, potentially exposing sensitive tokens. While this may be necessary for browser image src attributes, consider implementing short-lived temporary tokens specifically for image access, or use a separate endpoint that validates the token and redirects to a signed URL.
| # Step 6: Drop old columns from record_images | ||
| op.drop_column('record_images', 'modified_at') | ||
| op.drop_column('record_images', 'material') | ||
| op.drop_column('record_images', 'collection_id') | ||
| op.drop_column('record_images', 'project_id') |
There was a problem hiding this comment.
The migration drops columns project_id and collection_id from record_images (lines 111-112) without first dropping the check_record_single_parent constraint that was created in the previous migration (48189f9482e3) which references these columns. This could cause the migration to fail. The constraint should be explicitly dropped before dropping the columns it references.
| record_count = db.query(func.count(RecordImage.id)).filter( | ||
| RecordImage.collection_id == collection_id |
There was a problem hiding this comment.
The record count query is incorrect. It queries RecordImage.collection_id but RecordImage doesn't have a collection_id field according to the new data model. Records are linked to collections via Record.collection_id, and RecordImages are linked to Records via RecordImage.record_id. The query should be: db.query(func.count(Record.id)).filter(Record.collection_id == collection_id).scalar()
| record_count = db.query(func.count(RecordImage.id)).filter( | |
| RecordImage.collection_id == collection_id | |
| record_count = db.query(func.count(Record.id)).filter( | |
| Record.collection_id == collection_id |
This pull request introduces a major refactor to the backend data model, focusing on renaming and restructuring the "document" concept to "record," separating descriptive metadata from image data, and introducing collections as a new organizational entity. It also updates authentication logic and camera API endpoints to support these changes.
Data model refactor and migration:
document_imagestable torecord_images, updated all related foreign keys, indexes, and references in associated tables (camera_settings,exif_data) to userecord_image_idinstead ofdocument_image_id.recordstable to separate descriptive metadata from images, migrated existing data by creating oneRecordperRecordImage, and updatedrecord_imagesto link torecords. Old metadata columns were removed fromrecord_images.collectionstable to organize records, with support for hierarchical collections and linking records to collections or projects. Updatedrecord_imagesto supportcollection_idand enforced single-parent constraints.API and authentication updates:
UserLoginschema for clarity and security.Camera API enhancements:
RecordandRecordImage, allowing capture requests to specify or create records, and to return record/image IDs in responses. [1] [2] [3] [4]Miscellaneous:
record,collection) for migration registration.These changes lay the groundwork for more robust metadata management, collection organization, and improved API usability.