Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize @valibot/to-json-schema package #802

Conversation

gcornut
Copy link
Contributor

@gcornut gcornut commented Aug 26, 2024

Following the discussions on gcornut/valibot-json-schema#84, here is my proposal on a new package for converting Valibot to JSON schema to be moved here next to Valibot core library code :)

The setup

  1. New packages/to-json-schema NPM package named @valibot/to-json-schema
  2. Some basic schema & validation conversions
    • any schema
    • null schema
    • string schema
    • object schema
    • email validation
  3. Lint, Types and Tests (100% coverage)

I tried to match the code style of the core Valibot library.

What is next ?

  • More conversions: all the supported schemas, validations and metadata already supported by @gcornut/valibot-json-schema
  • More conversion options
    • Convert schemas into JSON schema definitions
    • Convert non JSON types to similar, yet different types (for example: Date into format: 'date-time' or format: 'unix-time', BigInt into format: 'int64' or string, explicit undefined into null, etc.)
  • Customize which JSON schema features to use ?
    => Some tools (like OpenAI structured output) does not support all the features of JSON schema. Maybe we could add a configuration to skip those features ?

I'll deprecate and archive @gcornut/valibot-json-schema once this lib gets to feature parity. Although I will keep the CLI in a new repo because it's a use case I still need.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
@fabian-hiller fabian-hiller self-assigned this Aug 27, 2024
@fabian-hiller fabian-hiller added enhancement New feature or request priority This has priority labels Aug 27, 2024
@fabian-hiller
Copy link
Owner

Thank you! I will probably give you feedback on everything by tomorrow.

@fabian-hiller
Copy link
Owner

Thanks again for the PR. I have reviewed everything and also investigated several alternative implementations. I even tried a fully typesafe implementation where the output type was the actual JSON schema output. Unfortunately, the typesafe implementation got quite complex and probably too hard to maintain at this point. Since Valibot is not a JSON-schema-first library, I don't expect many users to expect such an implementation. However, I have added an alternative implementation to /src2 with a focus on simplicity, bundle size and performance. Can you take a look at it and let me know what you think? Once we have decided on the basic structure of the source code, we can continue to add support for more schemas and actions.

@gcornut
Copy link
Contributor Author

gcornut commented Aug 29, 2024

@fabian-hiller You alternative implementation looks good 👍
I can add more schemas and actions conversion and clean up the code if you want

@fabian-hiller
Copy link
Owner

Thank you! Would you change anything? Otherwise, feel free to clean up the code and add support for more schemas and actions after we answer my next question.

I saw that TypeBox's Type.Undefined() returns { type: 'undefined' }, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7 type). Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?

@fabian-hiller
Copy link
Owner

We should also check the integration with OpenAI and Vercel AI (related Issue) before extending our implementation.

@gcornut
Copy link
Contributor Author

gcornut commented Aug 29, 2024

saw that TypeBox's Type.Undefined() returns { type: 'undefined' }, but this does not seem to be supported by JSON Schema Draft 7 (according to the JSONSchema7 type)

I don't see how JSON schema could support undefined since JSON does not support undefined (only null) 🤷

Should we upgrade to a newer one? Should we even specify the version we are using? Are you aware of any breaking changes in newer versions of JSON Schema that would result in different output?

Yes their are newer versions of the JSON schema specs (which some major changes 😬) but most tools use the draft-07 version (OpenAI, Vercel AI, etc.)

And actually if we explicitly want to support OpenAI structured output we would need to have a subset of JSON schema since it does not support every feature. For Vercel AI I can't see document that suggest they don't support all JSON schema features 🤷 (they use the same TypeScript JSONSchema7 type as we do).

We could add an option target?: 'json-schema-draft-07' | 'openai-structured-outputs' that could change the output and fail on unsupported feature (with still the possibility of forcing the conversion).

@gcornut
Copy link
Contributor Author

gcornut commented Aug 29, 2024

Some people JSON schema only for documentation purpose so maybe for them we could add another output "target" that could output things like type: 'undefined' which are unusable to validation a JSON payload but maybe useful for documentation.

@fabian-hiller
Copy link
Owner

fabian-hiller commented Aug 29, 2024

Thanks for your detailed answer. For now, I think it is best to optimize everything for JSON Schema Draft 7 and thread everything else as not supported (force can still be used). In the long run, we will evaluate how people use this package and change or add functionality based on that. Feel free to clean up the code and add support for more schemas and actions. You can always reach out if you have any Valibot related questions or if you think I can help you in any other way. I am very grateful for your contribution!

@gcornut
Copy link
Contributor Author

gcornut commented Aug 30, 2024

How strict do you think this lib should be on the input schema ?

I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values. And actually these values needs to be strictly compatible, excluding types like Date (that does serialize to JSON but as a string) or Infinity (that serializes to null in JSON).

Maybe the actions we convert should check on which type they are applied (email is only applicable to string, integer on number, etc.) ? Valibot does mostly guard against action/schema missmatch, but only via TS type ?

It's probably overkill but we could validate the input schema via a valibot meta schema 😄

@fabian-hiller
Copy link
Owner

I was thinking that the default value in optional/null/nullish schemas and the value in the literal schema should only be JSON compatible values...

I think that toJsonSchema will throw anyway if the wrapped schema is not JSON schema compatible, or am I wrong? If I'm right, I think we don't need to worry about the type of the default value. Infinity and NaN are probably the only exceptions, but it is a very very rare case.

Maybe the actions we convert should check on which type they are applied...

I think we should not worry about that. Our API is meant to be used in a certain way, and we should not have to deal with people using it incorrectly.

@gcornut gcornut force-pushed the initialize_valibot_to-json-schema_package branch from cd11852 to d556711 Compare September 2, 2024 16:13
Copy link

pkg-pr-new bot commented Sep 2, 2024

Open in Stackblitz

pnpm add https://pkg.pr.new/valibot@802

commit: 87f419c

@gcornut
Copy link
Contributor Author

gcornut commented Sep 2, 2024

Alright so I've rebased from the main branch. I've added almost all possible schema & action conversions.
If it's too big, I can split the PR. I already tried to split this in commits.

I need to add optional conversions for date and bigint since they are not technically supported in JSON but can be converted to number or string depending on the use case.

@fabian-hiller
Copy link
Owner

Thank you so much! It is ok for me if this PR is "big". I will try to review everything in the next days with the goal to release this package next week.

@gcornut
Copy link
Contributor Author

gcornut commented Sep 4, 2024

Ok I've added optional date and bigint schema conversion (requiring "strategy" configs)

@fabian-hiller
Copy link
Owner

I am not sure if we should add the strategy configs from the beginning. I would prefer to wait until people come up with good use cases and then decide on the implementation. But feel free to share your opinion since your conversion library has a longer history.

@fabian-hiller
Copy link
Owner

I started refactoring some parts and will commit my changes later. We will probably have a few implementation interactions in the next days before releasing the first version of the package next week.

@fabian-hiller
Copy link
Owner

I refactored the code and made the following bigger changes:

Definitions: I removed the definitions property from config as I think we can create them automatically as needed. Basically, whenever convertSchema sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.

Assert JSON: I replaced or removed the assertJSON utility because I could not find a case where a correct use of Valibot could actually lead to an invalid case. I may be wrong, so feel free to add comments and discuss this with me.

Sorry for partially overwriting or changing your code without asking, but I think this will help us ship the first version of this package faster. Please let me know if you want to revert or change any of my modifications.

I did not touch the tests. We can do that after we agree on the final implementation.

@gcornut
Copy link
Contributor Author

gcornut commented Sep 9, 2024

Definitions: I removed the definitions property from config as I think we can create them automatically as needed. Basically, whenever convertSchema sees a lazy schema that is not already part of the definitions, it will add it with a generated reference counting up from 0. If users want to explicitly define the definitions and references, we can add that later as needed.

Named definitions seems an essential feature to me. I know I would have to hack it back into my scripts that exports my valibot schemas into JSON schema.

But I won't be blocking just for that if you want to proceed 👍

@fabian-hiller
Copy link
Owner

I agree that this could be an essential feature. Therefore, I added support for explicitly specified definitions with my last commit.

@fabian-hiller
Copy link
Owner

Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?

@gcornut
Copy link
Contributor Author

gcornut commented Sep 10, 2024

Should I continue with the final review and unit tests, or do you have more feedback on the current implementation?

All good for me 👍

@fabian-hiller
Copy link
Owner

fabian-hiller commented Sep 12, 2024

I think that I will merge this PR later (I am not done yet) and ship our v0.1.0 release 🎉

@fabian-hiller
Copy link
Owner

@gcornut do you think we should completely ignore unsupported schemas and actions, or is it better to try to get it right? A concrete example is minLength. Should we add nothing to the output (and loose the information) if the schema type is not supported or should we add both minLength and minItems to the JSON Schema output?

@fabian-hiller
Copy link
Owner

fabian-hiller commented Sep 12, 2024

If we choose the second strategy, we might consider removing the force configuration and just console.warn when something is not officially supported.

@gcornut
Copy link
Contributor Author

gcornut commented Sep 13, 2024

Both solutions sound ok to me as long as the conversion is strict by default.

Like mentioned earlier there is definitely some people who just want to output schemas for documentation purpose and don't care about having 100% correct JSON schema definitions.

But maybe we could look into a "loose" output format later

@fabian-hiller
Copy link
Owner

What do you think about removing force and only console.warn in such cases?

@gcornut
Copy link
Contributor Author

gcornut commented Sep 13, 2024

What do you think about removing force and only console.warn in such cases?

Sounds ok. If some user need to be noticed that the conversion was not completely successful we can think of adding another option later 👍

@fabian-hiller
Copy link
Owner

fabian-hiller commented Sep 13, 2024

I think I will keep force and throw when it is false or undefined and console.warn otherwise. I will push my changes soon

@fabian-hiller
Copy link
Owner

Thanks so much @gcornut for your contribution! We are ready to merge! 🍾

@fabian-hiller fabian-hiller merged commit 2eefe51 into fabian-hiller:main Sep 13, 2024
11 checks passed
@fabian-hiller
Copy link
Owner

v0.1.0 is available

@gcornut gcornut deleted the initialize_valibot_to-json-schema_package branch September 14, 2024 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority This has priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants