Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated schema #486

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion SCHEMA-SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ constrain the allowed semantics of a KDL document. This can be used for many
purposes: documentation for users, automated verification, or even automated
generation of bindings!

This document describes KDL Schema version `1.0.0`. It was released on September 11, 2021.
This document describes KDL Schema version `2.0.0`. It is unreleased.

## The Formal Schema

Expand Down Expand Up @@ -39,6 +39,14 @@ None.
* `tag-names` (optional): [Validations](#validation-nodes) to apply to the _names_ of tags of child nodes.
* `other-tags-allowed` (optional): Whether to allow node tags other than the ones explicitly listed here. Defaults to `#false`.

#### Example

```kdl
document {

}
```

### `info` node

The `info` node describes the schema itself.
Expand Down
172 changes: 172 additions & 0 deletions schema/cargo.kdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
@kdl:schema "https://github.com/kdl-org/kdl/blob/main/schema/kdl-schema.kdl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will this @kdl:schema be used? If it doesn't have to be a reachable URL, then do you still think that it's valuable to have as a top-level key like this?

Do you envision KDL documents adding this to their top-level? Does that mean users are expected to handle this node when parsing with kdl-rs? Or should KDL parsers remove this node from the AST before passing it on to user code? If so, then I think that would require a change to the KDL spec?

I'm not opposed to this, though I'd personally probably keep them separate and give any sort of validator two files (the data and the schema it follows) — unless I'm misunderstanding though, I think this might be better as a field in metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to JSONSchema's $schema, which points to a valid (relative) URL.

This is not required, e.g. I can edit my angular.json and get LSP support because the Angular VSCode plugin interprets any file called angular.json as an Angular configuration document.
But, for schemas that don't come with editor plugins or that don't have a filename convention, it is very handy that you can add that $schema property and the JSON plugin will kick in and provide LSP support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can definitely see that as nice to have for editor / tooling support! Would implementations like kdl-rs strip this out of the AST returned to users? Essentially just because I agree it would be something nice to add to get LSP support, but I wouldn't want that addition to affect any result of parsing!

Is this the sort of thing that could / should be a standard-formatted comment? Like how sometimes people seem to set vi / vim settings in comments with vi: or whatever at the top of a file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users would have to ignore this field, if they want it to be present. Just like folks are free to ignore $schema when using JSONSchema.

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair... I'll sleep on it — I'd personally find it a little wacky to have to describe in my schema for a KDL format the node that will be used to refer back to that schema. So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @kdl:schema node in it as well? If it's even a URL linked to a file I can edit?

But, before I whinge any more, I'll read up properly on $schema in JSON and do some proper thinking!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this so you no longer need to remember to include this in your own schemas: it is now a default item for all schemas, and can be disabled by doing node @kdl:schema { undefine }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @KDL:schema node in it as well?

There are multiple versions of JSONSchema. The $schema in the schema is used to define the version of JSONSchema in which it is written, so it serves an extra purpose on top of enabling your editor to provide support while writing schemas.

For example, data file "my-data.kdl" has @kdl:schema "./my-schema.kdl", schema file my-schema.kdl would then have @kdl:schema "https://kdl.dev/schema/v0.2.kdl"; (fictitious URL)

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

Agreed, both because it feels bad and because it's actually useful data. Just like the version of JSONSchema, this could be used to differentiate between different versions of configuration file formats. If this were a comment, you'd have to have a comment to a versioned URL ánd a version indicator in the data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've done some pondering and perhaps come to the conclusion that the first draft of this was best, but with some caveats. If I'm understanding correctly, it sounds like there's two, slightly distinct, reasons to want something like this:

  1. It's a way to link files to a schema that can be used to enrich editor / LSP-type support
  2. It can also be used as a sort of idiomatic / canonical version field, if a format's schema has changed over time

But one of my problems remains if the @kdl:schema must be an uncommented node: say I'm working on a certain type of KDL config file — it's the type of thing that's complex enough that I'd like my LSP to know the schema and validate my edits. The issue, however, is that this config file is read by some code that looks like this:

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
    bonds: BondData,
    modifications: ModData,
    // Etc...
}

Where the important part is the #[serde(deny_unknown_fields)] — which is think is a nice thing to have if you want to enforce that config files are free of unrecognized sections, something that could help a lot in catching user typos. Now I'm in a bit of a bind because, once again, I have to switch between 1) having @kdl:schema uncommented, so LSP support works but the config can't actually be loaded, or 2) commenting out that node, losing LSP support, but actually being able to test the config with the program it's meant for.

Obviously, however, this is a non-problem if the program already uses @kdl:schema as a standard convention for specifying a version:

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
    #[serde(rename = "@kdl:schema")
    version: Url
    bonds: BondData,
    modifications: ModData,
    // Etc...
}

Then leaving @kdl:schema uncommented 100% of the time is perfectly fine!

To address both of these cases, we could:

  1. Make it clear that @kdl:schema is the idiomatic way to version KDL formats, and set a good example by having the kdl-schema.kdl define the optional @kdl:schema node
  2. Make sure it's clear to all LSP implementations that schema information can come in one of two forms: either an uncommented @kdl:schema node, I suppose anywhere in the file, or a /- @kdl:schema comment at the top of the file for when you want editor support, but the KDL format you're writing disallows it.
  3. Maybe revert the "it is now a default item for all schemas" change? I think I'd prefer to reduce the amount of magic, and if a format wants to use @kdl:schema to keep track of versions, then I think it should say so explicitly.

In summary then, I think rolling this back to how it started (schemas must explicitly allow @kdl:schema) and making it clear that language servers should recognize both @kdl:schema and /- @kdl:schema (maybe allow other commented forms?) would be my ideal solution.

That way even if someone's program is explicitly disallowing unknown fields (to save users from typos) and doesn't care to version files, I can still get editor support.

Let me know what you think about demystifying the @kdl:schema node and "downgrading" it to something that's idiomatic convention / recognized by tooling.

SIDENOTE: Serde support for KDL will need to make decently heavy use of #[serde(rename = "...")], and quick-xml, for example, already uses the "@..." prefix for attributes. I was thinking perhaps we could use "$..." for properties (since they felt kind shell-variable-ish), but we'll likely need at least one other special prefix reserved. If we used @ for that, like #[serde(rename="@arguments")] or whatever, then reading @kdl:schema would become a bit more awkward (though not impossible, I don't think?). Might be a bit bike-sheddy, but we should perhaps think about either removing the @, or rule it out as a potential serde prefix!

Sorry for the flip-flop on this, but I think this is a solution I'd be 100% happy with <3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through it, the main difference here with JSONSchema is that it allows other properties by default. That is, "$schema": "foo" won't fail validation.

It would be a weaker validation, but we could always just default to allowing all children/props/args, and replace allow-others with disallow-others. Basically, make things more permissive by default, but let people constrain stuff. We can also add an allow-ksl to specifically allow KSL-related keywords.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more time to reply to other comments soon, but for this one, I do like KSL being much more stringent (disallowing others by default), but I would be okay with something like allow-ksl if you think it would be useful — though maybe that only makes sense if there are several KSL keywords eventually?

Unfortunately, I don't think that would solve the issue of an application not using KSL, just using serde and me wanting to add that LSP-type support (assuming I can't change the serde derive myself). Because in that case there is no schema validation being done (via KSL, anyways). For what it's worth, knus / knuffle by default disallow unknown nodes, so I couldn't get schema-enhanced editor support in any of my current projects unless I could have /-@ksl:schema commented.

But I'm happy with what you originally had + letting the LSP pick up the node in comment form as well? If the user wants to use @ksl:schema as a real node / version, then I do think they should be required to specify that in their schema — like you originally had it :)

That's my personal preference, anyways — keeping things very explicit and minimizing special cases :)


metadata {
// TODO: update this link when we're ready to release something.
link "https://github.com/kdl-org/kdl/blob/main/schema/cargo.kdl" rel=self
title "Cargo Schema" lang=en
description "KDL-based translation of the Cargo.toml schema." lang=en
author "Kat Marchán" {
link "https://github.com/zkat" rel=self
}
link "https://github.com/kdl-org/kdl" rel=documentation
link "https://doc.rust-lang.org/cargo/reference/manifest.html" rel=documentation
license "Creative Commons Attribution-ShareAlike 4.0 International License" spdx=CC-BY-SA-4.0 {
link "https://creativecommons.org/licenses/by-sa/4.0/" lang=en
}
}

children {
node package title="Describes a package" {
children {
node name title="The name of the package" {
required
arg {
type string
pattern #"^[a-zA-Z0-0\-_]+$"#
}
}
node version title="The version of the package." {
arg {
type string
// From https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string
pattern #"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"#
}
}
node authors title="The authors of the package." {
repeatable
args {
distinct
type string
}
children {
node - {
repeatable
arg title="Name" {
type string
}
prop email title="Email address" {
type string
format email
}
prop about title="Brief note about author (role, etc)" {
type string
}
}
}
}
node edition title="The Rust edition." {
arg {
type string
enum "2015" "2018" "2021" "2024"
}
}
node rust-version title="The minimal supported Rust version." {
arg {
type string
}
}
node description title="A description of the package." {
arg {
type string
}
}
node documentation title="URL of the package documentation." {
arg {
type string
format url
}
}
node readme title="Path to the package’s README file." {
arg {
type string #boolean
}
}
node homepage title="URL of the package homepage." {
arg {
type string
format url
}
}
node repository title="URL of the package source repository." {
arg {
type string
format url
}
}
node license title="The package license." {
arg {
type string
}
}
node license-file title="Path to the text of the license." {
arg {
type string
}
}
node keywords title="Keywords for the package." {
args {
type string
// No pattern because keyword restrictions are only on
// crates.io
}
}
node categories title="Categories of the package." {
args {
type string
// No pattern because category restrictions are only on
// crates.io
}
}
node workspace title="Path to the workspace for the package." {
arg {
type string
}
}
node build title="Path to the package build script." {
arg {
type string boolean
}
}
node links title="Name of the native library the package links with." {
arg {
type string
}
}
node exclude title="Files to exclude when publishing." {
args {
type string
}
}
node include title="Files to include when publishing." {
args {
type string
}
}
node publish title="Can be used to prevent publishing the package." {
// TODO: This is a good example of where we might need smarter
// comstraints ("either a single boolean, or 1+ strings")
args {
type string boolean
}
]
node metadata title="Extra settings for external tools." {
repeat
args
props {
allow-others
}
}
node default-run title="The default binary to run by cargo run." {
arg {
type string
}
}
node no-autolib title="Disables library auto discovery."
node no-autobins title="Disables binary auto discovery."
node no-autoexamples title="Disables example auto discovery."
node no-autotests title="Disables test auto discovery."
node no-autobenches title="Disables bench auto discovery."
node resolver title="Sets the dependency resolver to use."
}
}
}
Loading