Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Pglite Support Type Generation Without a Running PostgreSQL Instance #528

Open
techmannih opened this issue Feb 7, 2025 · 3 comments

Comments

@techmannih
Copy link

techmannih commented Feb 7, 2025

I want to use pglite to generate types without requiring a live PostgreSQL instance. Currently, most tools that interact with PostgreSQL for type generation need a running PostgreSQL server. Would it be possible for pglite to support this use case, allowing type generation without needing PostgreSQL to be installed or running in the background?

CC: @copiltembel @samwillis

@samwillis
Copy link
Collaborator

Hey @techmannih

I experiment a little with this a few moths back, repo is here: https://github.com/samwillis/pglite-type-inference

My experiments lead to adding the describeQuery method, this can be used as the basis of an inference system: https://pglite.dev/docs/api#describequery

There are some videos of it and more info on Twitter:

https://x.com/samwillis/status/1848672368819892233

https://x.com/samwillis/status/1848684120341152120

It's not something that we're likely to try and do as part of the PGlite project, but it absolutely a use case we would like to support. Anything tweak or changes, let me know.

@techmannih
Copy link
Author

Just I am trying with this simple example, what do you think?

import * as zg from "zapatos/generate"
import {
  getConnectionStringFromEnv,
  getPgConnectionFromEnv,
} from "pg-connection-from-env"
import { Context } from "./get-project-context"
import { dumpTree } from "pg-schema-dump"
import path from "path"

export const generate = async ({
  schemas,
  defaultDatabase,
  dbDir,
}: Pick<Context, "schemas" | "defaultDatabase" | "dbDir">) => {
  dbDir = dbDir ?? "./src/db"

  await zg.generate({
    db: {
      connectionString: getConnectionStringFromEnv({
        fallbackDefaults: {
          database: defaultDatabase,
        },
      }),
    },
    schemas: Object.fromEntries(
      schemas.map((s) => [
        s,
        {
          include: "*",
          exclude: [],
        },
      ])
    ),
    outDir: dbDir,
  })

  await dumpTree({
    targetDir: path.join(dbDir, "structure"),
    defaultDatabase,
    schemas,
  })
}

@copiltembel
Copy link
Collaborator

Here's a wild idea on how to do it without even a database:
Build a repository of the most common combinations of column name -> type. Query it each time you see a new column name in a query. Or use it to query an LLM if you don't feel like building that repo. Maybe you can even pass the entire query to the LLM and tell it to extract name-type tuples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants