Skip to content

feat(firestore-bigquery-export): Add Gemini agent to gen-schema-view #2242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
41978c4
chore: run npm audit fix --force
cabljac Dec 18, 2024
6967401
chore: run npm audit fix --force again
cabljac Dec 18, 2024
b238d33
feat(firestore-bigquery-export): add AI agent option to gen-schema-sc…
cabljac Dec 18, 2024
0ccb55e
feat(firestore-bigquery-export): add human-in-the-loop and docs to ge…
cabljac Dec 18, 2024
08c7439
chore(gen-schema-view): add todo for checking the table prefix option
cabljac Dec 18, 2024
0c2fa91
test(firestore-bigquery-export): update schema-loader-utils tests
cabljac Dec 19, 2024
c8d34ff
test(firestore-bigquery-export): fix e2e tests
cabljac Dec 19, 2024
2085a5b
Update firestore-bigquery-export/scripts/gen-schema-view/src/schema/i…
cabljac Dec 19, 2024
44f1724
chore(gen-schema-view): format
cabljac Dec 19, 2024
f37cc30
refactor(firestore-bigquery-export): update gen-schema gemini approach
cabljac Dec 23, 2024
b90661f
chore(firestore-bigquery-export): remove traces of ai agent wording
cabljac Dec 23, 2024
1be7755
Update firestore-bigquery-export/guides/GENERATE_SCHEMA_VIEWS.md
cabljac Dec 23, 2024
f7a9aae
refactor(gen-schema-view): extract config parsing to their own module…
cabljac Mar 4, 2025
3940aba
refactor(gen-schema-view): simplify genkit flow
cabljac Mar 4, 2025
d8f28cb
fix(gen-schema-view): made some changes
CorieW Mar 4, 2025
3016862
fix(gen-schema-view): get rid of some redundancy and fix problem with…
CorieW Mar 4, 2025
d2e1901
WIP
CorieW Mar 5, 2025
7f395b6
test(gen-schema-view): fix e2e testing
cabljac Mar 6, 2025
6d19d5b
test(gen-schema-view): add config testing
cabljac Mar 10, 2025
2a98f18
fix(gen-schema-view): update complete gen log and point to filename i…
cabljac Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions firestore-bigquery-export/guides/GENERATE_SCHEMA_VIEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,62 @@ Since all document data is stored in the schemaless changelog, mistakes in
schema configuration don't affect the underlying data and can be resolved by
re-running the schema-views script against an updated schema file.

### Using Gemini to generate draft schema files

Instead of manually creating schema files, you can use the built-in Gemini Agent to automatically analyze your Firestore collection and generate an appropriate schema. The agent will:

1. Sample documents from your collection
2. Analyze the data structure
3. Generate a well-documented schema file
4. Create the corresponding BigQuery views

You can use Gemini in either interactive or non-interactive mode:

```bash
# Interactive mode
npx @firebaseextensions/fs-bq-schema-views
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to include a copy-paste minimal usage here, e.g. add --use-gemini-agent and other required params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a non-interactive example below, can you explain what you mean? confused


# Non-interactive mode
npx @firebaseextensions/fs-bq-schema-views \
--non-interactive \
--project=${param:PROJECT_ID} \
--big-query-project=${param:BIGQUERY_PROJECT_ID} \
--dataset=${param:DATASET_ID} \
--table-name-prefix=${param:TABLE_ID} \
--use-gemini \
--collection-path=your_collection_path \
--google-ai-key=your_api_key \
--agent-sample-size=50 \
--schema-dir=./schemas
```

#### Agent Parameters

- `--use-gemini-agent`: Enable the Gemini AI Agent for schema generation
- `--collection-path`: Path to the Firestore collection to analyze
- `--google-ai-key`: Your Google AI API key for the Gemini model
- `--agent-sample-size`: Number of documents to sample (default: 10, max: 100)
- `--schema-dir`: Directory where generated schema files will be stored (default: "./schemas")

#### Generated Schema

The agent will create a schema file named `${param:TABLE_ID}.json` in your specified schema directory. This schema should include:

- Appropriate field types based on your data
- Detailed descriptions for each field
- Proper handling of nested objects and arrays
- BigQuery-compatible type mappings

The generated schema follows the same format as manually created schemas and supports all standard Firestore data types. The agent aims to produce schemas that are both accurate and performant for BigQuery views.

Note that although Gemini is good at producing these schema files and the prompts have been tested, the generative AI models are inherently probabilistic, and you should double check the generated schema before continuing.

### Next Steps

- [Learn about the columns in a schema view](#columns-in-a-schema-view)
- [Take a look at more SQL examples](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/EXAMPLE_QUERIES.md)
- [Troubleshoot common issues](#common-schema-file-configuration-mistakes)

## About Schema Views

### Views created by the script
Expand Down
Loading
Loading