Skip to content

Fix MP4/M4A metadata persistence diagnostics classification#38

Merged
ChrisAdamsdevelopment merged 1 commit into
mainfrom
codex/fix-mp4/m4a-diagnostics-classification
May 20, 2026
Merged

Fix MP4/M4A metadata persistence diagnostics classification#38
ChrisAdamsdevelopment merged 1 commit into
mainfrom
codex/fix-mp4/m4a-diagnostics-classification

Conversation

@ChrisAdamsdevelopment
Copy link
Copy Markdown
Owner

@ChrisAdamsdevelopment ChrisAdamsdevelopment commented May 20, 2026

Motivation

  • MP4/M4A processing was being misclassified as metadata_present_in_snapshots_but_report_or_download_mismatch when only normal inter-stage SHA-256 drift occurred despite final snapshots containing the expected metadata.
  • deepSnapshot sometimes left selectedMetadata empty when exiftool.read returned structured output instead of line text, reducing diagnostic usefulness.
  • Diagnostics should preserve privacy by omitting lyrics and redacting long text fields while avoiding false mismatch signals.

Description

  • Added hasExpectedCoreMetadata and updated classifyMetadataPersistenceStage to return metadata_present_and_verified when the final snapshot contains core fields (Title, Artist, Producer, Copyright).
  • Stopped using normal SHA-256 differences between processing stages as an automatic mismatch signal and introduced an explicit mismatchEvidence gate (e.g., client hash mismatch or external verification contradiction) required to emit metadata_present_in_snapshots_but_report_or_download_mismatch.
  • Improved deepSnapshot extraction to de-duplicate entries, omit lyrics, redact Description/Comment as length+SHA-256, and add an object-key fallback extraction when the structured read output would otherwise produce no selectedMetadata.
  • Kept scope strictly to diagnostics in server/processor.js and did not change metadata-writing, XMP cleanup, timestamp writes, auth, CORS, frontend, MP3 Quick Cleanse, or package files.

Testing

  • node --check server/processor.js succeeded with no syntax errors.
  • A runtime helper invocation that require('./server/processor') was attempted but could not run in this environment due to the missing exiftool-vendored dependency, so behavioral runtime assertions requiring ExifTool were not executed.

Codex Task

Summary by Sourcery

Refine media metadata persistence diagnostics to reduce false mismatches and improve privacy-aware snapshot reporting.

New Features:

  • Introduce core-metadata verification so final snapshots with expected title, artist, producer, and copyright are classified as metadata_present_and_verified.
  • Require explicit external mismatch evidence to emit report-or-download mismatch diagnostics instead of relying on inter-stage hash differences.
  • Enhance deepSnapshot metadata extraction to de-duplicate entries, skip lyrics, redact long text fields with length and hash, and fall back to structured ExifTool output when line-based parsing yields no data.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 20, 2026

Reviewer's Guide

Refines MP4/M4A metadata diagnostics to rely on core field presence and explicit mismatch evidence rather than hash drift, and hardens deep metadata snapshots for privacy, de-duplication, and structured ExifTool output, while keeping scope limited to diagnostic logic in server/processor.js.

File-Level Changes

Change Details Files
Tightened metadata persistence classification to use core field checks and explicit mismatch evidence instead of inter-stage hash differences.
  • Introduced hasExpectedCoreMetadata to require Title, Artist, Producer, and Copyright in the final snapshot.
  • Extended classifyMetadataPersistenceStage signature to accept a mismatchEvidence object and compute finalHasExpectedCore.
  • Removed hash-based mismatch detection between after_xmp_cleanup and after_timestamp_write_final and replaced it with metadata_present_and_verified when core fields are present.
  • Added explicit mismatchEvidence checks (client hash, external report, download verification) before returning metadata_present_in_snapshots_but_report_or_download_mismatch.
  • Updated processMediaFile to call classifyMetadataPersistenceStage with an explicit empty mismatchEvidence object.
server/processor.js
Improved deepSnapshot metadata extraction for de-duplication, privacy redaction, and structured ExifTool outputs.
  • Added a seenSelected Set and helper addSelected to avoid duplicate entries and filter out lyrics.
  • Changed Description/Comment handling to redact values as length plus sha256 while using addSelected for consistent de-duplication and filtering.
  • Replaced direct pushes to selectedMetadata with addSelected to centralize filtering logic.
  • Added a fallback path that, when line-based parsing yields no selectedMetadata, iterates over object-style raw ExifTool output, applying the same include field/prefix filters.
  • In the fallback, redacts Description/Comment fields via redactLongTextField and stringifies other values with stringifyValue before adding them.
server/processor.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ChrisAdamsdevelopment ChrisAdamsdevelopment merged commit 597fb17 into main May 20, 2026
4 of 5 checks passed
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new mismatchEvidence gating in classifyMetadataPersistenceStage is never actually used (always passed {} from processMediaFile), so the metadata_present_in_snapshots_but_report_or_download_mismatch path is currently unreachable; consider plumbing through real evidence or at least adding a clear TODO where the object is constructed.
  • The hashes parameter to classifyMetadataPersistenceStage is now unused; if inter-stage SHA-256 is no longer part of the decision, consider removing this parameter (and its call-site argument) to avoid confusion about whether hashes still influence classification.
  • The new hasExpectedCoreMetadata check requires all four fields (Title, Artist, Producer, Copyright) to be present before returning metadata_present_and_verified, which is stricter than hasDescriptiveMetadata; ensure this all-or-nothing requirement matches the intended business logic, or relax it if some fields are optional in practice.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `mismatchEvidence` gating in `classifyMetadataPersistenceStage` is never actually used (always passed `{}` from `processMediaFile`), so the `metadata_present_in_snapshots_but_report_or_download_mismatch` path is currently unreachable; consider plumbing through real evidence or at least adding a clear TODO where the object is constructed.
- The `hashes` parameter to `classifyMetadataPersistenceStage` is now unused; if inter-stage SHA-256 is no longer part of the decision, consider removing this parameter (and its call-site argument) to avoid confusion about whether hashes still influence classification.
- The new `hasExpectedCoreMetadata` check requires all four fields (`Title`, `Artist`, `Producer`, `Copyright`) to be present before returning `metadata_present_and_verified`, which is stricter than `hasDescriptiveMetadata`; ensure this all-or-nothing requirement matches the intended business logic, or relax it if some fields are optional in practice.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant