Improve MP4/M4A server cleanse metadata quality#32
Conversation
Reviewer's GuideRefactors MP4/M4A metadata handling to use QuickTime ItemList tags, adds precise QuickTime timestamp writing and metadata-quality verification, expands browser-side metadata extraction, and exposes a richer forensic report to the frontend and API clients. Sequence diagram for updated MP4/M4A processing and forensic reportingsequenceDiagram
actor User
participant BrowserApp as Browser_App
participant readFileMetadata as readFileMetadata
participant Server as API_Server
participant processMediaFile as processMediaFile
participant exiftool as exiftool
User->>BrowserApp: Select MP4/M4A file
BrowserApp->>readFileMetadata: readFileMetadata(file)
readFileMetadata-->>BrowserApp: { title, artist, producer, copyright, genre, lyrics, detectedMarkers }
BrowserApp->>Server: POST /api/process
activate Server
Server->>processMediaFile: processMediaFile({ outputPath, platform, metadata })
activate processMediaFile
processMediaFile->>exiftool: read(outputPath)
exiftool-->>processMediaFile: beforeTags
processMediaFile->>processMediaFile: buildMetaToWrite(platform, metadata)
processMediaFile->>exiftool: write(outputPath, metaToWrite, -overwrite_original)
processMediaFile->>exiftool: write(outputPath, {}, -XMP:all= -overwrite_original)
processMediaFile->>processMediaFile: formatQuickTimeTimestamp()
processMediaFile->>exiftool: writeQuickTimeTimestamps(outputPath, exportTimestamp)
exiftool-->>processMediaFile: timestampWriteWarnings
processMediaFile->>exiftool: read(outputPath)
exiftool-->>processMediaFile: finalTags
processMediaFile->>processMediaFile: detectMarkers(finalTags)
processMediaFile->>processMediaFile: verifyFinalState(finalTags)
processMediaFile->>processMediaFile: buildQualityVerification(finalTags, metadata, timestampWriteWarnings)
processMediaFile-->>Server: { report }
deactivate processMediaFile
Server->>Server: Set X-Forensic-Report header (JSON.stringify(report))
Server-->>BrowserApp: 200 OK + download URL
deactivate Server
BrowserApp->>BrowserApp: Parse X-Forensic-Report header
BrowserApp-->>User: Show status, qualityVerification, and download link
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The default copyright/artist expectations in
buildQualityVerificationare recomputed withnew Date().getUTCFullYear()instead of reusing the same year/export timestamp used inbuildMetaToWrite, which risks subtle drift (e.g., at year boundaries) and makes tests brittle; consider passing the export timestamp or computed year intobuildQualityVerificationso both paths share a single source of truth. - Verification logic currently checks tags like
Artist,Copyright, andProducerdirectly while the writer is now usingItemList:*keys; if ExifTool doesn’t always mirror these to the short names, it might miss expected fields, so it may be safer to extendreadAnyTagto also look at the correspondingItemList:*variants when validating the final metadata.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The default copyright/artist expectations in `buildQualityVerification` are recomputed with `new Date().getUTCFullYear()` instead of reusing the same year/export timestamp used in `buildMetaToWrite`, which risks subtle drift (e.g., at year boundaries) and makes tests brittle; consider passing the export timestamp or computed year into `buildQualityVerification` so both paths share a single source of truth.
- Verification logic currently checks tags like `Artist`, `Copyright`, and `Producer` directly while the writer is now using `ItemList:*` keys; if ExifTool doesn’t always mirror these to the short names, it might miss expected fields, so it may be safer to extend `readAnyTag` to also look at the corresponding `ItemList:*` variants when validating the final metadata.
## Individual Comments
### Comment 1
<location path="server/processor.js" line_range="112-121" />
<code_context>
+function buildQualityVerification(tags = {}, metadata = {}, timestampWriteWarnings = []) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Quality verification only checks un-namespaced tag keys and may falsely flag missing fields when only ItemList:* variants exist.
`buildQualityVerification` currently calls `readAnyTag(tags, ['Title'])`, `['Artist']`, etc., but `buildMetaToWrite` now writes `ItemList:Title`, `ItemList:Artist`, `ItemList:Producer`, etc. If only `ItemList:*` keys are present, the bare `Title`/`Artist` keys may be missing and the verification will incorrectly report `expected_*_missing`. Please include the ItemList variants in the read keys (e.g. `['Title', 'ItemList:Title']`) or normalize tag names before verification so the checks match the write path.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| function buildQualityVerification(tags = {}, metadata = {}, timestampWriteWarnings = []) { | ||
| const failures = []; | ||
| const warnings = timestampWriteWarnings.map((field) => ({ code: 'timestamp_write_skipped', field, message: `${field} could not be updated safely by ExifTool.` })); | ||
| const expected = { | ||
| title: cleanText(metadata.title || 'Untitled', 255), | ||
| artist: cleanText(metadata.artist, 255), | ||
| producer: cleanText(metadata.producer, 255), | ||
| copyright: cleanText(metadata.copyright, 500) || (cleanText(metadata.artist, 255) ? `© ${new Date().getUTCFullYear()} ${cleanText(metadata.artist, 255)}` : ''), | ||
| }; | ||
| if (tags.XMPToolkit) failures.push({ code: 'xmp_toolkit_present', field: 'XMPToolkit', message: 'XMPToolkit remains in final output.' }); |
There was a problem hiding this comment.
issue (bug_risk): Quality verification only checks un-namespaced tag keys and may falsely flag missing fields when only ItemList:* variants exist.
buildQualityVerification currently calls readAnyTag(tags, ['Title']), ['Artist'], etc., but buildMetaToWrite now writes ItemList:Title, ItemList:Artist, ItemList:Producer, etc. If only ItemList:* keys are present, the bare Title/Artist keys may be missing and the verification will incorrectly report expected_*_missing. Please include the ItemList variants in the read keys (e.g. ['Title', 'ItemList:Title']) or normalize tag names before verification so the checks match the write path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dba2bac0a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| res.setHeader('X-Forensic-Removed', report.removedCount); | ||
| res.setHeader('X-Forensic-Tags', JSON.stringify(report.removedTags.slice(0, 50))); | ||
| res.setHeader('X-Forensic-Status', report.status || 'Sanitized'); | ||
| res.setHeader('X-Forensic-Report', JSON.stringify(report)); |
There was a problem hiding this comment.
Avoid putting JSON report with user text in response header
Setting X-Forensic-Report to JSON.stringify(report) can throw ERR_INVALID_CHAR when metadata contains non-Latin1 characters (for example Japanese, emoji, or many non-Western artist/title values), because Node validates header bytes and rejects those characters. In that case the /api/process request falls into the error path and returns a server failure instead of the cleansed file. This should be moved to the response body (or safely encoded) rather than a raw custom header.
Useful? React with 👍 / 👎.
Motivation
0000:00:00 00:00:00and ensure files show sane current export timestamps after server cleanse.XMPToolkit/Image::ExifTooland stop injecting generic branding likeArtist: CreatorandCopyright: © 2026 Creatorwhen user metadata exists.Description
ItemList:Title,ItemList:Artist,ItemList:Producer,ItemList:Copyright,ItemList:Keyword,ItemList:Genre, and platform-specificItemList:Description,ItemList:Comment,ItemList:Album,ItemList:ContentCreateDate,ItemList:Lyrics) and avoid unqualified tag injection that can leave XMP traces.QuickTime:CreateDate,QuickTime:ModifyDate,TrackCreateDate,TrackModifyDate,MediaCreateDate, andMediaModifyDate, and collects non-fatal warnings if ExifTool cannot update specific track/media fields.Creator; ifartistis blank theArtistfield is omitted, andCopyrightdefaults to© <CURRENT_YEAR> <artist>only whenartistis present and no explicitcopyrightwas supplied, with a small inference forproducer(detectsTriple7in context if present).-XMP:all=after re-writing and added marker rules to detectXMPToolkitandImage::ExifTool, while treating binary MP4 structural fields (e.g.,AVCConfiguration,SampleSizes,ChunkOffset,MediaData, and related track/sample fields) as benign.buildQualityVerification) that flags remainingXMPToolkit/Image::ExifTooltraces, genericCreatorinjections, zeroed QuickTime/track/media timestamps, and missing expectedTitle/Artist/Copyright/Producer; verification findings are included in the report and influencestatus(clean,clean_with_notes,review_required).X-Forensic-Reportresponse header and updated the clientFormDatato includetitle,artist,producer,copyright,genre,description,tags,lyrics, andplatform.server/processor.js,server/metadataRules.js,server.js,app.tsx, andsrc/utils/metadata.js(browser metadata parsing updated to extractproducer/copyright/lyrics).Testing
node --check server/processor.js && node --check server.js, which passed.npm run build(TypeScript + Vite) and the build completed successfully.buildMetaToWrite, timestamp formatting, andbuildQualityVerification, and the assertions passed.npm ci(default) failed in this container due to nativebetter-sqlite3build incompatibility with the local Node (24.xvs expected20.x), whilenpm ci --ignore-scriptssucceeded and was used to validate the frontend/build steps.Codex Task
Summary by Sourcery
Improve MP4/M4A server-side cleansing to preserve user metadata, eliminate XMP/tool traces, and add metadata-quality verification to influence cleanse status and reporting.
New Features:
Bug Fixes:
Enhancements: