Skip to content

Expand parse without semicolons #1949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions src/dialect/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1036,8 +1036,14 @@ pub trait Dialect: Debug + Any {
/// Returns true if the specified keyword should be parsed as a table factor alias.
/// When explicit is true, the keyword is preceded by an `AS` word. Parser is provided
/// to enable looking ahead if needed.
fn is_table_factor_alias(&self, explicit: bool, kw: &Keyword, parser: &mut Parser) -> bool {
explicit || self.is_table_alias(kw, parser)
///
/// When the dialect supports statements without semicolon delimiter, actual keywords aren't parsed as aliases.
fn is_table_factor_alias(&self, explicit: bool, kw: &Keyword, _parser: &mut Parser) -> bool {
if self.supports_statements_without_semicolon_delimiter() {
kw == &Keyword::NoKeyword
} else {
explicit || self.is_table_alias(kw, _parser)
}
}

/// Returns true if this dialect supports querying historical table data
Expand Down Expand Up @@ -1136,6 +1142,18 @@ pub trait Dialect: Debug + Any {
fn supports_notnull_operator(&self) -> bool {
false
}

/// Returns true if the dialect supports parsing statements without a semicolon delimiter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to give an example here?

Something like

Suggested change
/// Returns true if the dialect supports parsing statements without a semicolon delimiter.
/// Returns true if the dialect supports parsing statements without a semicolon delimiter.
///
/// If returns true, the following SQL will not parse. If returns `false` the SQL will parse
///
/// ```sql
/// SELECT 1
/// SELECT 2
/// ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

///
/// If returns true, the following SQL will not parse. If returns `false` the SQL will parse
///
/// ```sql
/// SELECT 1
/// SELECT 2
/// ```
fn supports_statements_without_semicolon_delimiter(&self) -> bool {
false
}
}

/// This represents the operators for which precedence must be defined
Expand Down
9 changes: 8 additions & 1 deletion src/dialect/mssql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ impl Dialect for MsSqlDialect {
}

fn supports_connect_by(&self) -> bool {
true
false
}

fn supports_eq_alias_assignment(&self) -> bool {
Expand Down Expand Up @@ -123,6 +123,10 @@ impl Dialect for MsSqlDialect {
true
}

fn supports_statements_without_semicolon_delimiter(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 MS SQL!!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️🤣

true
}

/// See <https://learn.microsoft.com/en-us/sql/relational-databases/security/authentication-access/server-level-roles>
fn get_reserved_grantees_types(&self) -> &[GranteesType] {
&[GranteesType::Public]
Expand Down Expand Up @@ -280,6 +284,9 @@ impl MsSqlDialect {
) -> Result<Vec<Statement>, ParserError> {
let mut stmts = Vec::new();
loop {
while let Token::SemiColon = parser.peek_token_ref().token {
parser.advance_token();
}
if let Token::EOF = parser.peek_token_ref().token {
break;
}
Expand Down
9 changes: 9 additions & 0 deletions src/keywords.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1072,6 +1072,7 @@ pub const RESERVED_FOR_TABLE_ALIAS: &[Keyword] = &[
Keyword::ANTI,
Keyword::SEMI,
Keyword::RETURNING,
Keyword::RETURN,
Keyword::ASOF,
Keyword::MATCH_CONDITION,
// for MSSQL-specific OUTER APPLY (seems reserved in most dialects)
Expand All @@ -1097,6 +1098,11 @@ pub const RESERVED_FOR_TABLE_ALIAS: &[Keyword] = &[
Keyword::TABLESAMPLE,
Keyword::FROM,
Keyword::OPEN,
Keyword::INSERT,
Keyword::UPDATE,
Keyword::DELETE,
Keyword::EXEC,
Keyword::EXECUTE,
];

/// Can't be used as a column alias, so that `SELECT <expr> alias`
Expand Down Expand Up @@ -1126,6 +1132,7 @@ pub const RESERVED_FOR_COLUMN_ALIAS: &[Keyword] = &[
Keyword::CLUSTER,
Keyword::DISTRIBUTE,
Keyword::RETURNING,
Keyword::RETURN,
// Reserved only as a column alias in the `SELECT` clause
Keyword::FROM,
Keyword::INTO,
Expand All @@ -1140,6 +1147,7 @@ pub const RESERVED_FOR_TABLE_FACTOR: &[Keyword] = &[
Keyword::LIMIT,
Keyword::HAVING,
Keyword::WHERE,
Keyword::RETURN,
];

/// Global list of reserved keywords that cannot be parsed as identifiers
Expand All @@ -1150,4 +1158,5 @@ pub const RESERVED_FOR_IDENTIFIER: &[Keyword] = &[
Keyword::INTERVAL,
Keyword::STRUCT,
Keyword::TRIM,
Keyword::RETURN,
];
73 changes: 68 additions & 5 deletions src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,22 @@ impl ParserOptions {
self.unescape = unescape;
self
}

/// Set if semicolon statement delimiters are required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usecase to have the same configuration information on the ParserOptions and the Dialect?

It seems there are a few places in the Parser that inconsistently check the options and/or the dialect flag. If we had only one way to specify the option, there would not be any room for inconsistencies

Or maybe I misunderstand what is going on here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usecase to have the same configuration information on the ParserOptions and the Dialect?

It's similar to the existing with_trailing_commas logic, and has several practical use cases:

  1. can be set to true for SQL Server & left false for all others (until if/when someone discovers this is supported in other dialects)
  2. integration with existing testing patterns. eg, we can have all_dialects_requiring_semicolon_statement_delimiter & all_dialects_not_requiring_semicolon_statement_delimiter
  3. used to set the corresponding default parser option

It seems like testing parsing without semicolons would get a lot worse if it wasn't a dialect option

///
/// If this option is `true`, the following SQL will not parse. If the option is `false`, the SQL will parse.
///
/// ```sql
/// SELECT 1
/// SELECT 2
/// ```
pub fn with_require_semicolon_stmt_delimiter(
mut self,
require_semicolon_stmt_delimiter: bool,
) -> Self {
self.require_semicolon_stmt_delimiter = require_semicolon_stmt_delimiter;
self
}
}

#[derive(Copy, Clone)]
Expand Down Expand Up @@ -362,7 +378,11 @@ impl<'a> Parser<'a> {
state: ParserState::Normal,
dialect,
recursion_counter: RecursionCounter::new(DEFAULT_REMAINING_DEPTH),
options: ParserOptions::new().with_trailing_commas(dialect.supports_trailing_commas()),
options: ParserOptions::new()
.with_trailing_commas(dialect.supports_trailing_commas())
.with_require_semicolon_stmt_delimiter(
!dialect.supports_statements_without_semicolon_delimiter(),
),
}
}

Expand Down Expand Up @@ -484,13 +504,18 @@ impl<'a> Parser<'a> {

match self.peek_token().token {
Token::EOF => break,

// end of statement
Token::Word(word) => {
if expecting_statement_delimiter && word.keyword == Keyword::END {
break;
}
}
// don't expect a semicolon statement delimiter after a newline when not otherwise required
Token::Whitespace(Whitespace::Newline) => {
if !self.options.require_semicolon_stmt_delimiter {
expecting_statement_delimiter = false;
}
}
_ => {}
}

Expand All @@ -500,7 +525,7 @@ impl<'a> Parser<'a> {

let statement = self.parse_statement()?;
stmts.push(statement);
expecting_statement_delimiter = true;
expecting_statement_delimiter = self.options.require_semicolon_stmt_delimiter;
}
Ok(stmts)
}
Expand Down Expand Up @@ -4541,6 +4566,14 @@ impl<'a> Parser<'a> {
return Ok(vec![]);
}

if end_token == Token::SemiColon && !self.options.require_semicolon_stmt_delimiter {
if let Token::Word(ref kw) = self.peek_token().token {
if kw.keyword != Keyword::NoKeyword {
return Ok(vec![]);
}
}
}

if self.options.trailing_commas && self.peek_tokens() == [Token::Comma, end_token] {
let _ = self.consume_token(&Token::Comma);
return Ok(vec![]);
Expand All @@ -4558,6 +4591,9 @@ impl<'a> Parser<'a> {
) -> Result<Vec<Statement>, ParserError> {
let mut values = vec![];
loop {
// ignore empty statements (between successive statement delimiters)
while self.consume_token(&Token::SemiColon) {}

match &self.peek_nth_token_ref(0).token {
Token::EOF => break,
Token::Word(w) => {
Expand All @@ -4569,7 +4605,13 @@ impl<'a> Parser<'a> {
}

values.push(self.parse_statement()?);
self.expect_token(&Token::SemiColon)?;

if self.options.require_semicolon_stmt_delimiter {
self.expect_token(&Token::SemiColon)?;
}

// ignore empty statements (between successive statement delimiters)
while self.consume_token(&Token::SemiColon) {}
}
Ok(values)
}
Expand Down Expand Up @@ -16464,7 +16506,28 @@ impl<'a> Parser<'a> {

/// Parse [Statement::Return]
fn parse_return(&mut self) -> Result<Statement, ParserError> {
match self.maybe_parse(|p| p.parse_expr())? {
let rs = self.maybe_parse(|p| {
let expr = p.parse_expr()?;

match &expr {
Expr::Value(_)
| Expr::Function(_)
| Expr::UnaryOp { .. }
| Expr::BinaryOp { .. }
| Expr::Case { .. }
| Expr::Cast { .. }
| Expr::Convert { .. }
| Expr::Subquery(_) => Ok(expr),
// todo: how to retstrict to variables?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more like a semantic check -- this crate is focused on syntax parsing.

Read more hre:

https://github.com/apache/datafusion-sqlparser-rs?tab=readme-ov-file#syntax-vs-semantics

I think the error is inappropriate in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error is inappropriate in this case

What do you suggest?

This seems more like a semantic check -- this crate is focused on syntax parsing.

The difficulty here is that without semicolon tokens the return statement is ambiguous. Consider the following perfectly valid T-SQL statement (which is also a test case example) which has several variations of return:

CREATE OR ALTER PROCEDURE example_sp
AS
    IF USER_NAME() = 'X'
        RETURN
    
    IF 1 = 2
        RETURN (SELECT 1)

    RETURN CONVERT(INT, 123)

If you revert this change and run that test, you get this error:

thread 'test_supports_statements_without_semicolon_delimiter' panicked at tests/sqlparser_mssql.rs:2533:14:
called `Result::unwrap()` on an `Err` value: ParserError("Expected: an SQL statement, found: 1 at Line: 1, Column: 72")

The reason is that for the first return, the self.maybe_parse(|p| p.parse_expr())? "successfully" parses/consumes the IF keyword. However, this is contrary to the keyword documentation for SQL Server (ref) which requires an "integer expression".

I think I had said when introducing parse_return in a previous PR that we'd have to come back and tighten it up eventually, but I can't find that discussion 😐.

Expr::Identifier(id) if id.value.starts_with('@') => Ok(expr),
_ => parser_err!(
"Non-returnable expression found following RETURN",
p.peek_token().span.start
),
}
})?;

match rs {
Some(expr) => Ok(Statement::Return(ReturnStatement {
value: Some(ReturnStatementValue::Expr(expr)),
})),
Expand Down
64 changes: 64 additions & 0 deletions src/test_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#[cfg(not(feature = "std"))]
use alloc::{
boxed::Box,
format,
string::{String, ToString},
vec,
vec::Vec,
Expand Down Expand Up @@ -186,6 +187,32 @@ impl TestedDialects {
statements
}

/// The same as [`statements_parse_to`] but it will strip semicolons from the SQL text.
pub fn statements_without_semicolons_parse_to(
&self,
sql: &str,
canonical: &str,
) -> Vec<Statement> {
let sql_without_semicolons = sql.replace(";", " ");
let statements = self
.parse_sql_statements(&sql_without_semicolons)
.expect(&sql_without_semicolons);
if !canonical.is_empty() && sql != canonical {
assert_eq!(self.parse_sql_statements(canonical).unwrap(), statements);
} else {
assert_eq!(
sql,
statements
.iter()
// note: account for format_statement_list manually inserted semicolons
.map(|s| s.to_string().trim_end_matches(";").to_string())
.collect::<Vec<_>>()
.join("; ")
);
}
statements
}

/// Ensures that `sql` parses as an [`Expr`], and that
/// re-serializing the parse result produces canonical
pub fn expr_parses_to(&self, sql: &str, canonical: &str) -> Expr {
Expand Down Expand Up @@ -318,6 +345,43 @@ where
all_dialects_where(|d| !except(d))
}

/// Returns all dialects that don't support statements without semicolon delimiters.
/// (i.e. dialects that require semicolon delimiters.)
pub fn all_dialects_requiring_semicolon_statement_delimiter() -> TestedDialects {
let tested_dialects =
all_dialects_except(|d| d.supports_statements_without_semicolon_delimiter());
assert_ne!(tested_dialects.dialects.len(), 0);
tested_dialects
}

/// Returns all dialects that do support statements without semicolon delimiters.
/// (i.e. dialects not requiring semicolon delimiters.)
pub fn all_dialects_not_requiring_semicolon_statement_delimiter() -> TestedDialects {
let tested_dialects =
all_dialects_where(|d| d.supports_statements_without_semicolon_delimiter());
assert_ne!(tested_dialects.dialects.len(), 0);
tested_dialects
}

/// Asserts an error for `parse_sql_statements`:
/// - "end of statement" for dialects that require semicolon delimiters
/// - "an SQL statement" for dialects that don't require semicolon delimiters.
pub fn assert_err_parse_statements(sql: &str, found: &str) {
assert_eq!(
ParserError::ParserError(format!("Expected: end of statement, found: {found}")),
all_dialects_requiring_semicolon_statement_delimiter()
.parse_sql_statements(sql)
.unwrap_err()
);

assert_eq!(
ParserError::ParserError(format!("Expected: an SQL statement, found: {found}")),
all_dialects_not_requiring_semicolon_statement_delimiter()
.parse_sql_statements(sql)
.unwrap_err()
);
}

pub fn assert_eq_vec<T: ToString>(expected: &[&str], actual: &[T]) {
assert_eq!(
expected,
Expand Down
Loading