Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(firestore-bigquery-export): Add Gemini agent to gen-schema-view #2242

Open
wants to merge 12 commits into
base: next
Choose a base branch
from

Conversation

cabljac
Copy link
Contributor

@cabljac cabljac commented Dec 18, 2024

This PR adds the option to use a Gemini Agent to generate the schema files, fixes some tests, and corrects a parameter description.

@cabljac cabljac marked this pull request as ready for review December 19, 2024 12:12
@cabljac cabljac requested a review from a team as a code owner December 19, 2024 12:12
@cabljac cabljac changed the title @invertase/gen schema agent feat(firestore-bigquery-export: add Gemini agent to gen-schema-view Dec 19, 2024
@Ehesp Ehesp changed the title feat(firestore-bigquery-export: add Gemini agent to gen-schema-view feat(firestore-bigquery-export): Add Gemini agent to gen-schema-view Dec 19, 2024
@Gustolandia
Copy link
Contributor

Gustolandia commented Dec 19, 2024

Testing

Deeply Nested Documents:

No explicit test cases for collections with deeply nested fields, such as maps within arrays or arrays within maps.

Large Collections

There are no tests for large collections that exceed the agent-sample-size. A test case should simulate the agent gracefully sampling the first N documents and stopping without errors.

test("should handle collections with more than 100 documents", async () => {
  const largeSampleData = Array.from({ length: 200 }, (_, i) => ({ id: i, value: `doc${i}` }));
  const response = await runAgent(apiKey, "./schemas", "large_collection", "test_table", largeSampleData);
  expect(response).toBeDefined();
  expect(response.success).toBeTruthy();
});

@cabljac
Copy link
Contributor Author

cabljac commented Dec 19, 2024

  • Deeply Nested Documents:

    • No explicit test cases for collections with deeply nested fields, such as maps within arrays or arrays within maps.
    • Suggestion: Add a test in e2e.test.ts with a deeply nested document like:
      const sampleData = [
          {
              user: {
                  profile: {
                      name: "Alice",
                      age: 25,
                  },
                  settings: {
                      notifications: {
                          email: true,
                          push: false,
                      },
                  },
              },
          },
      ];
  • Large Collections:

    • There are no tests for large collections that exceed the agent-sample-size. A test case should simulate the agent gracefully sampling the first N documents and stopping without errors.
      test("should handle collections with more than 100 documents", async () => {
          const largeSampleData = Array.from({ length: 200 }, (_, i) => ({ id: i, value: `doc${i}` }));
          const response = await runAgent(apiKey, "./schemas", "large_collection", "test_table", largeSampleData);
          expect(response).toBeDefined();
          expect(response.success).toBeTruthy();
      });

I think i'll leave these out, they're fair points but not relevant to this PR in particular.

Actually the "Large Collections" comment isn't correct; we're taking just a sample

@Gustolandia
Copy link
Contributor

Gustolandia commented Dec 19, 2024

Actually the "Large Collections" comment isn't correct; we're taking just a sample

The current tests seem to be missing a verification for whether the sampling behavior correctly handles large collections. The tests should explicitly check that only the specified number of samples (agent-sample-size) are being processed, regardless of the total size of the collection.

What I wrote should be like this instead (apologies):

  test("should handle collections with more than 100 documents by sampling", async () => {
      const largeSampleData = Array.from({ length: 200 }, (_, i) => ({ id: i, value: `doc${i}` }));
      const agentSampleSize = 50; // Example sample size
      const response = await runAgent(apiKey, "./schemas", "large_collection", "test_table", largeSampleData);

      // Verify correct sampling behavior
      const sampledData = largeSampleData.slice(0, agentSampleSize); // Expected sample
      expect(response).toBeDefined();
      expect(response.success).toBeTruthy();

      // Assert sampled data is passed for schema generation
      expect(response.schemaGeneratedFrom).toEqual(sampledData); // Assuming `response.schemaGeneratedFrom` reflects sampled data
  });


```bash
# Interactive mode
npx @firebaseextensions/fs-bq-schema-views
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to include a copy-paste minimal usage here, e.g. add --use-gemini-agent and other required params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a non-interactive example below, can you explain what you mean? confused

config.schemas[schemaName]
);

if (config.useGemini) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check if the schema file already exist and exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants