Skip to content

Commit 4ab3ab9

Browse files
alambiffyio
andauthored
Update comments / docs for Spanned (#1549)
Co-authored-by: Ifeanyi Ubah <[email protected]>
1 parent f4f112d commit 4ab3ab9

File tree

8 files changed

+306
-91
lines changed

8 files changed

+306
-91
lines changed

README.md

+12-7
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,18 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
100100
issues and distinguish different syntaxes in the AST.
101101

102102

103-
## WIP: Extracting source locations from AST nodes
103+
## Source Locations (Work in Progress)
104104

105-
This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.
105+
This crate allows recovering source locations from AST nodes via the [Spanned]
106+
trait, which can be used for advanced diagnostics tooling. Note that this
107+
feature is a work in progress and many nodes report missing or inaccurate spans.
108+
Please see [this ticket] for information on how to contribute missing
109+
improvements.
106110

107-
```rust
108-
use sqlparser::ast::Spanned;
111+
[Spanned]: https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html
112+
[this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
109113

114+
```rust
110115
// Parse SQL
111116
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();
112117

@@ -123,9 +128,9 @@ SQL was first standardized in 1987, and revisions of the standard have been
123128
published regularly since. Most revisions have added significant new features to
124129
the language, and as a result no database claims to support the full breadth of
125130
features. This parser currently supports most of the SQL-92 syntax, plus some
126-
syntax from newer versions that have been explicitly requested, plus some MSSQL,
127-
PostgreSQL, and other dialect-specific syntax. Whenever possible, the [online
128-
SQL:2016 grammar][sql-2016-grammar] is used to guide what syntax to accept.
131+
syntax from newer versions that have been explicitly requested, plus various
132+
other dialect-specific syntax. Whenever possible, the [online SQL:2016
133+
grammar][sql-2016-grammar] is used to guide what syntax to accept.
129134

130135
Unfortunately, stating anything more specific about compliance is difficult.
131136
There is no publicly available test suite that can assess compliance

docs/source_spans.md

-52
This file was deleted.

src/ast/helpers/attached_token.rs

+59-5
Original file line numberDiff line numberDiff line change
@@ -19,25 +19,73 @@ use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
1919
use core::fmt::{self, Debug, Formatter};
2020
use core::hash::{Hash, Hasher};
2121

22-
use crate::tokenizer::{Token, TokenWithSpan};
22+
use crate::tokenizer::TokenWithSpan;
2323

2424
#[cfg(feature = "serde")]
2525
use serde::{Deserialize, Serialize};
2626

2727
#[cfg(feature = "visitor")]
2828
use sqlparser_derive::{Visit, VisitMut};
2929

30-
/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
31-
/// This should be used when a token is not relevant for semantics, but is still needed for
32-
/// accurate source location tracking.
30+
/// A wrapper over [`TokenWithSpan`]s that ignores the token and source
31+
/// location in comparisons and hashing.
32+
///
33+
/// This type is used when the token and location is not relevant for semantics,
34+
/// but is still needed for accurate source location tracking, for example, in
35+
/// the nodes in the [ast](crate::ast) module.
36+
///
37+
/// Note: **All** `AttachedTokens` are equal.
38+
///
39+
/// # Examples
40+
///
41+
/// Same token, different location are equal
42+
/// ```
43+
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
44+
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
45+
/// // commas @ line 1, column 10
46+
/// let tok1 = TokenWithLocation::new(
47+
/// Token::Comma,
48+
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
49+
/// );
50+
/// // commas @ line 2, column 20
51+
/// let tok2 = TokenWithLocation::new(
52+
/// Token::Comma,
53+
/// Span::new(Location::new(2, 20), Location::new(2, 21)),
54+
/// );
55+
///
56+
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
57+
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
58+
/// ```
59+
///
60+
/// Different token, different location are equal 🤯
61+
///
62+
/// ```
63+
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
64+
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
65+
/// // commas @ line 1, column 10
66+
/// let tok1 = TokenWithLocation::new(
67+
/// Token::Comma,
68+
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
69+
/// );
70+
/// // period @ line 2, column 20
71+
/// let tok2 = TokenWithLocation::new(
72+
/// Token::Period,
73+
/// Span::new(Location::new(2, 10), Location::new(2, 21)),
74+
/// );
75+
///
76+
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
77+
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
78+
/// ```
79+
/// // period @ line 2, column 20
3380
#[derive(Clone)]
3481
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
3582
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
3683
pub struct AttachedToken(pub TokenWithSpan);
3784

3885
impl AttachedToken {
86+
/// Return a new Empty AttachedToken
3987
pub fn empty() -> Self {
40-
AttachedToken(TokenWithSpan::wrap(Token::EOF))
88+
AttachedToken(TokenWithSpan::new_eof())
4189
}
4290
}
4391

@@ -80,3 +128,9 @@ impl From<TokenWithSpan> for AttachedToken {
80128
AttachedToken(value)
81129
}
82130
}
131+
132+
impl From<AttachedToken> for TokenWithSpan {
133+
fn from(value: AttachedToken) -> Self {
134+
value.0
135+
}
136+
}

src/ast/mod.rs

+14-2
Original file line numberDiff line numberDiff line change
@@ -596,9 +596,21 @@ pub enum CeilFloorKind {
596596

597597
/// An SQL expression of any type.
598598
///
599+
/// # Semantics / Type Checking
600+
///
599601
/// The parser does not distinguish between expressions of different types
600-
/// (e.g. boolean vs string), so the caller must handle expressions of
601-
/// inappropriate type, like `WHERE 1` or `SELECT 1=1`, as necessary.
602+
/// (e.g. boolean vs string). The caller is responsible for detecting and
603+
/// validating types as necessary (for example `WHERE 1` vs `SELECT 1=1`)
604+
/// See the [README.md] for more details.
605+
///
606+
/// [README.md]: https://github.com/apache/datafusion-sqlparser-rs/blob/main/README.md#syntax-vs-semantics
607+
///
608+
/// # Equality and Hashing Does not Include Source Locations
609+
///
610+
/// The `Expr` type implements `PartialEq` and `Eq` based on the semantic value
611+
/// of the expression (not bitwise comparison). This means that `Expr` instances
612+
/// that are semantically equivalent but have different spans (locations in the
613+
/// source tree) will compare as equal.
602614
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
603615
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
604616
#[cfg_attr(

src/ast/query.rs

+3-2
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,7 @@ impl fmt::Display for Table {
282282
pub struct Select {
283283
/// Token for the `SELECT` keyword
284284
pub select_token: AttachedToken,
285+
/// `SELECT [DISTINCT] ...`
285286
pub distinct: Option<Distinct>,
286287
/// MSSQL syntax: `TOP (<N>) [ PERCENT ] [ WITH TIES ]`
287288
pub top: Option<Top>,
@@ -511,7 +512,7 @@ impl fmt::Display for NamedWindowDefinition {
511512
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
512513
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
513514
pub struct With {
514-
// Token for the "WITH" keyword
515+
/// Token for the "WITH" keyword
515516
pub with_token: AttachedToken,
516517
pub recursive: bool,
517518
pub cte_tables: Vec<Cte>,
@@ -564,7 +565,7 @@ pub struct Cte {
564565
pub query: Box<Query>,
565566
pub from: Option<Ident>,
566567
pub materialized: Option<CteAsMaterialized>,
567-
// Token for the closing parenthesis
568+
/// Token for the closing parenthesis
568569
pub closing_paren_token: AttachedToken,
569570
}
570571

src/ast/spans.rs

+40-10
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,51 @@ use super::{
2121

2222
/// Given an iterator of spans, return the [Span::union] of all spans.
2323
fn union_spans<I: Iterator<Item = Span>>(iter: I) -> Span {
24-
iter.reduce(|acc, item| acc.union(&item))
25-
.unwrap_or(Span::empty())
24+
Span::union_iter(iter)
2625
}
2726

28-
/// A trait for AST nodes that have a source span for use in diagnostics.
27+
/// Trait for AST nodes that have a source location information.
2928
///
30-
/// Source spans are not guaranteed to be entirely accurate. They may
31-
/// be missing keywords or other tokens. Some nodes may not have a computable
32-
/// span at all, in which case they return [`Span::empty()`].
29+
/// # Notes:
30+
///
31+
/// Source [`Span`] are not yet complete. They may be missing:
32+
///
33+
/// 1. keywords or other tokens
34+
/// 2. span information entirely, in which case they return [`Span::empty()`].
35+
///
36+
/// Note Some impl blocks (rendered below) are annotated with which nodes are
37+
/// missing spans. See [this ticket] for additional information and status.
38+
///
39+
/// [this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
40+
///
41+
/// # Example
42+
/// ```
43+
/// # use sqlparser::parser::{Parser, ParserError};
44+
/// # use sqlparser::ast::Spanned;
45+
/// # use sqlparser::dialect::GenericDialect;
46+
/// # use sqlparser::tokenizer::Location;
47+
/// # fn main() -> Result<(), ParserError> {
48+
/// let dialect = GenericDialect {};
49+
/// let sql = r#"SELECT *
50+
/// FROM table_1"#;
51+
/// let statements = Parser::new(&dialect)
52+
/// .try_with_sql(sql)?
53+
/// .parse_statements()?;
54+
/// // Get the span of the first statement (SELECT)
55+
/// let span = statements[0].span();
56+
/// // statement starts at line 1, column 1 (1 based, not 0 based)
57+
/// assert_eq!(span.start, Location::new(1, 1));
58+
/// // statement ends on line 2, column 15
59+
/// assert_eq!(span.end, Location::new(2, 15));
60+
/// # Ok(())
61+
/// # }
62+
/// ```
3363
///
34-
/// Some impl blocks may contain doc comments with information
35-
/// on which nodes are missing spans.
3664
pub trait Spanned {
37-
/// Compute the source span for this AST node, by recursively
38-
/// combining the spans of its children.
65+
/// Return the [`Span`] (the minimum and maximum [`Location`]) for this AST
66+
/// node, by recursively combining the spans of its children.
67+
///
68+
/// [`Location`]: crate::tokenizer::Location
3969
fn span(&self) -> Span;
4070
}
4171

src/lib.rs

+58-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@
2525
//! 1. [`Parser::parse_sql`] and [`Parser::new`] for the Parsing API
2626
//! 2. [`ast`] for the AST structure
2727
//! 3. [`Dialect`] for supported SQL dialects
28+
//! 4. [`Spanned`] for source text locations (see "Source Spans" below for details)
29+
//!
30+
//! [`Spanned`]: ast::Spanned
2831
//!
2932
//! # Example parsing SQL text
3033
//!
@@ -61,13 +64,67 @@
6164
//! // The original SQL text can be generated from the AST
6265
//! assert_eq!(ast[0].to_string(), sql);
6366
//! ```
64-
//!
6567
//! [sqlparser crates.io page]: https://crates.io/crates/sqlparser
6668
//! [`Parser::parse_sql`]: crate::parser::Parser::parse_sql
6769
//! [`Parser::new`]: crate::parser::Parser::new
6870
//! [`AST`]: crate::ast
6971
//! [`ast`]: crate::ast
7072
//! [`Dialect`]: crate::dialect::Dialect
73+
//!
74+
//! # Source Spans
75+
//!
76+
//! Starting with version `0.53.0` sqlparser introduced source spans to the
77+
//! AST. This feature provides source information for syntax errors, enabling
78+
//! better error messages. See [issue #1548] for more information and the
79+
//! [`Spanned`] trait to access the spans.
80+
//!
81+
//! [issue #1548]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
82+
//! [`Spanned`]: ast::Spanned
83+
//!
84+
//! ## Migration Guide
85+
//!
86+
//! For the next few releases, we will be incrementally adding source spans to the
87+
//! AST nodes, trying to minimize the impact on existing users. Some breaking
88+
//! changes are inevitable, and the following is a summary of the changes:
89+
//!
90+
//! #### New fields for spans (must be added to any existing pattern matches)
91+
//!
92+
//! The primary change is that new fields will be added to AST nodes to store the source `Span` or `TokenWithLocation`.
93+
//!
94+
//! This will require
95+
//! 1. Adding new fields to existing pattern matches.
96+
//! 2. Filling in the proper span information when constructing AST nodes.
97+
//!
98+
//! For example, since `Ident` now stores a `Span`, to construct an `Ident` you
99+
//! must provide now provide one:
100+
//!
101+
//! Previously:
102+
//! ```text
103+
//! # use sqlparser::ast::Ident;
104+
//! Ident {
105+
//! value: "name".into(),
106+
//! quote_style: None,
107+
//! }
108+
//! ```
109+
//! Now
110+
//! ```rust
111+
//! # use sqlparser::ast::Ident;
112+
//! # use sqlparser::tokenizer::Span;
113+
//! Ident {
114+
//! value: "name".into(),
115+
//! quote_style: None,
116+
//! span: Span::empty(),
117+
//! };
118+
//! ```
119+
//!
120+
//! Similarly, when pattern matching on `Ident`, you must now account for the
121+
//! `span` field.
122+
//!
123+
//! #### Misc.
124+
//! - [`TokenWithLocation`] stores a full `Span`, rather than just a source location.
125+
//! Users relying on `token.location` should use `token.location.start` instead.
126+
//!
127+
//![`TokenWithLocation`]: tokenizer::TokenWithLocation
71128
72129
#![cfg_attr(not(feature = "std"), no_std)]
73130
#![allow(clippy::upper_case_acronyms)]

0 commit comments

Comments
 (0)