Product Quality Control Agent

Overview

This project provides a modular, BigQuery AI-powered pipeline for product data quality validation. It includes unified embedding generation, consistency checks, fusion logic, and unified mismatch scoring, all orchestrated for scalable, real-world and synthetic data evaluation.

Pipeline Modules & Functions

1. `pipeline/embeddings.py`

Functions:
- generate_text_embeddings(df, ...) – Generates text embeddings using BigQuery ML.GENERATE_EMBEDDING.
- generate_image_embeddings(df, ...) – Generates image embeddings (local or via BigQuery if supported).
- write_embeddings_to_bq(df, table_name) – Writes unified embeddings to BigQuery.

2. `pipeline/consistency.py`

Functions:
- run_consistency_checks(df, ...) – Runs vector similarity and BigQuery AI-based consistency checks.
- write_consistency_results_to_bq(df, table_name) – Stores check results in BigQuery.

3. `pipeline/vector_index.py`

Functions:
- create_vector_index(table_name, ...) – Creates vector indexes in BigQuery for fast similarity search.

4. `pipeline/fusion.py`

Functions:
- align_products(df, ...) – Aligns/fuses products using embedding similarity and crosswalk logic.
- write_crosswalk_to_bq(df, table_name) – Stores product crosswalks in BigQuery.

5. `pipeline/scoring.py`

Functions:
- compute_mismatch_scores(df, ...) – Computes unified mismatch scores combining vector and rule-based checks.
- write_scores_to_bq(df, table_name) – Writes scores to BigQuery.

Scripts

scripts/generate_text_embeddings.py – Calls embedding functions for text fields.
scripts/generate_image_embeddings.py – Calls embedding functions for images.
scripts/consistency_checks.py – Runs consistency checks.
scripts/create_vector_indexes.py – Creates vector indexes.
scripts/vector_search.py – Runs vector search queries.

Workflow

Ingest Data: Load product data into BigQuery.
Generate Embeddings: Use pipeline/embeddings.py to create and store unified embeddings.
Create Vector Indexes: Use pipeline/vector_index.py to index embeddings.
Run Consistency Checks: Use pipeline/consistency.py for AI-based validation.
Product Fusion: Use pipeline/fusion.py to align products and create crosswalks.
Mismatch Scoring: Use pipeline/scoring.py to compute and store scores.
Evaluation: Use the Jupyter notebook to run the end-to-end pipeline and report metrics.

BigQuery Cost & Runtime Notes

Embedding Generation:
- Uses ML.GENERATE_EMBEDDING (text) and/or local/image embedding logic.
- Cost: Standard BigQuery ML prediction costs per row. Large datasets can incur significant costs.
- Runtime: Scales with data size; expect several minutes for 10k+ rows.
Vector Index Creation:
- Cost: Standard storage and compute for index creation.
- Runtime: Fast for small/medium tables; can take longer for millions of rows.
Consistency Checks:
- Uses AI.GENERATE_BOOL, AI.GENERATE_TEXT, and vector similarity functions.
- Cost: Each AI function call incurs BigQuery AI costs; vector similarity is standard SQL compute.
- Runtime: Typically seconds to minutes per 10k rows.
Fusion & Scoring:
- Cost: Standard SQL compute for joins, scoring, and crosswalk creation.
- Runtime: Fast for most use cases; scales with data size.
General Advice:
- Use test/small datasets for development to minimize costs.
- Monitor BigQuery job history for cost and performance insights.

References

For more details, see the code in the pipeline/ and scripts/ directories and the evaluation notebook in notebooks/.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
product-qc-ai		product-qc-ai
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Quality Control Agent

Overview

Pipeline Modules & Functions

1. `pipeline/embeddings.py`

2. `pipeline/consistency.py`

3. `pipeline/vector_index.py`

4. `pipeline/fusion.py`

5. `pipeline/scoring.py`

Scripts

Workflow

BigQuery Cost & Runtime Notes

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Product Quality Control Agent

Overview

Pipeline Modules & Functions

1. pipeline/embeddings.py

2. pipeline/consistency.py

3. pipeline/vector_index.py

4. pipeline/fusion.py

5. pipeline/scoring.py

Scripts

Workflow

BigQuery Cost & Runtime Notes

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `pipeline/embeddings.py`

2. `pipeline/consistency.py`

3. `pipeline/vector_index.py`

4. `pipeline/fusion.py`

5. `pipeline/scoring.py`

Packages