Skip to content

Dev#12

Open
jairomelo wants to merge 94 commits intomainfrom
dev
Open

Dev#12
jairomelo wants to merge 94 commits intomainfrom
dev

Conversation

@jairomelo
Copy link
Member

This pull request introduces a major refactor to the backend data model, focusing on renaming and restructuring the "document" concept to "record," separating descriptive metadata from image data, and introducing collections as a new organizational entity. It also updates authentication logic and camera API endpoints to support these changes.

Data model refactor and migration:

  • Renamed the document_images table to record_images, updated all related foreign keys, indexes, and references in associated tables (camera_settings, exif_data) to use record_image_id instead of document_image_id.
  • Introduced a new records table to separate descriptive metadata from images, migrated existing data by creating one Record per RecordImage, and updated record_images to link to records. Old metadata columns were removed from record_images.
  • Added a collections table to organize records, with support for hierarchical collections and linking records to collections or projects. Updated record_images to support collection_id and enforced single-parent constraints.

API and authentication updates:

  • Modified authentication to allow tokens via query parameters as well as Authorization headers, improving support for browser image requests. [1] [2]
  • Changed the login endpoint to use a new UserLogin schema for clarity and security.

Camera API enhancements:

  • Updated camera API endpoints and models to reference Record and RecordImage, allowing capture requests to specify or create records, and to return record/image IDs in responses. [1] [2] [3] [4]
  • Refactored image capture logic to create or link records, associate captured images with records, and extract additional metadata such as file size and EXIF data. [1] [2]

Miscellaneous:

  • Updated Alembic environment to import new models (record, collection) for migration registration.
  • Added thumbnail generation utility import for future use in camera endpoints.

These changes lay the groundwork for more robust metadata management, collection organization, and improved API usability.

jairomelo and others added 30 commits January 17, 2026 12:17
- Using lifespan context manager to replace deprecated startup event
- Import pydantic_settings to resolve wrong pydantic import
- Add required dependencies to requirements.txt
- Mount only app/ subdirectory, remove reinstall to correct volume overwrite + race condition
- Introduces user authentication endpoints and models, project management endpoints and models, document and camera APIs with user and project associations.
- Adds password hashing and token-based security utilities, updates Pydantic schemas for new fields
- Includes a test script for API validation.
- Includes API documentation and backend implementation details (located inside `digitization-toolkit-software/docs/developers`) for mantainability and clearer frontend integration.
- Updates requirements and environment variables for new features.
jairomelo and others added 25 commits January 17, 2026 12:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements a comprehensive refactoring of the backend data model, transitioning from a flat "document" structure to a hierarchical "record/record_image" architecture with collection support. The changes enable better organization of digitized archival materials by separating descriptive metadata (records) from physical image files (record images), and introducing collections for hierarchical archival organization.

Changes:

  • Renamed and restructured the data model from DocumentImage to separate Record and RecordImage entities, with Records representing conceptual archival objects and RecordImages representing individual captured files
  • Introduced a Collection model for hierarchical organization with support for nested subcollections (e.g., fonds → series → box → folder)
  • Enhanced camera API to create or link captures to records, track manifest IDs (capture_id, pair_id), and return record/image IDs in responses
  • Modified authentication to accept tokens via query parameters in addition to Authorization headers for browser image requests

Reviewed changes

Copilot reviewed 25 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/setup_camera_backends.sh New setup script for Raspberry Pi camera dependencies
pixi.toml New Pixi package manager configuration for the project
app/models/record.py New model separating Record (metadata) from RecordImage (files)
app/models/collection.py New hierarchical collection model for archival organization
app/models/camera.py Updated to reference record_image_id instead of document_image_id
app/models/project.py Updated relationships to use records and collections
app/models/document.py Deleted - replaced by record.py
app/schemas/record.py Refactored schemas for Record, RecordImage, ExifData, and CameraSettings
app/schemas/collection.py New schemas for collection CRUD operations
app/schemas/camera.py Updated to use record_image_id
app/schemas/user.py Added UserLogin schema for login endpoint
app/api/records.py New endpoint replacing documents.py with record-centric operations
app/api/collections.py New endpoint for collection hierarchy management
app/api/cameras.py Updated to create/link records and return record/image IDs
app/api/projects.py Updated references from documents to records
app/api/auth.py Added query parameter token support for image requests
app/api/system.py New system monitoring endpoint for CPU temperature
app/api/documents.py Deleted - replaced by records.py
app/main.py Updated router imports and CORS origins
app/core/db.py Updated model imports
app/core/thumbnail.py Updated comments to reference records
alembic/env.py Updated model imports for migrations
alembic/versions/c3d4e5f6a7b8_*.py Migration to rename document_images to record_images
alembic/versions/48189f9482e3_*.py Migration to add collections table
alembic/versions/19e2aefe5b17_*.py Migration to separate Record from RecordImage
capture/service.py Updated return values to include capture_id and pair_id
.gitignore Added Pixi-related files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +84 to +88
# Accept token from Authorization header OR ?token= query param (needed for <img src>)
raw_token = credentials.credentials if credentials else token
if not raw_token:
raise HTTPException(status_code=401, detail="Not authenticated")
payload = verify_access_token(raw_token)
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting authentication tokens via query parameters poses a security risk. Query parameters are often logged in server logs, browser history, and referrer headers, potentially exposing sensitive tokens. While this may be necessary for browser image src attributes, consider implementing short-lived temporary tokens specifically for image access, or use a separate endpoint that validates the token and redirects to a signed URL.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +112
# Step 6: Drop old columns from record_images
op.drop_column('record_images', 'modified_at')
op.drop_column('record_images', 'material')
op.drop_column('record_images', 'collection_id')
op.drop_column('record_images', 'project_id')
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration drops columns project_id and collection_id from record_images (lines 111-112) without first dropping the check_record_single_parent constraint that was created in the previous migration (48189f9482e3) which references these columns. This could cause the migration to fail. The constraint should be explicitly dropped before dropping the columns it references.

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +124
record_count = db.query(func.count(RecordImage.id)).filter(
RecordImage.collection_id == collection_id
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The record count query is incorrect. It queries RecordImage.collection_id but RecordImage doesn't have a collection_id field according to the new data model. Records are linked to collections via Record.collection_id, and RecordImages are linked to Records via RecordImage.record_id. The query should be: db.query(func.count(Record.id)).filter(Record.collection_id == collection_id).scalar()

Suggested change
record_count = db.query(func.count(RecordImage.id)).filter(
RecordImage.collection_id == collection_id
record_count = db.query(func.count(Record.id)).filter(
Record.collection_id == collection_id

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants