Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataFusionError::Collection to return multiple DataFusionErrors #14439

Merged
merged 22 commits into from
Feb 7, 2025

Conversation

eliaperantoni
Copy link
Contributor

@eliaperantoni eliaperantoni commented Feb 3, 2025

This PR adds DataFusionError::Collection so that multiple errors can be returned at once, shortening the feedback loop for developers because they would now be able to fix more than error in their SQL query before trying to run it again.

Example:

All of these errors are returned in a single run:

image

For now, I just implemented this when:

  • Planning different SelectItems
  • Planning different SetExprs

Which issue does this PR close?

Closes #13676.

Rationale for this change

See #13676.

What changes are included in this PR?

  • Add DataFusionError::Collection
  • Add DataFusionError::iter
  • Add tests
  • Collect multiple errors, instead of returning at the first one, when iterating over SelectItems and SetExprs

Are these changes tested?

Yes, see datafusion/sql/tests/cases/collection.rs

Are there any user-facing changes?

There is one opt-in addition: DataFusionError::iter.

@github-actions github-actions bot added sql SQL Planner common Related to common crate labels Feb 3, 2025
@github-actions github-actions bot added the core Core DataFusion crate label Feb 3, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 3, 2025
@eliaperantoni
Copy link
Contributor Author

@alamb pr is ready for review. I'm getting this weird error that appears to be because of the TTY size being smaller in CI, making the error wrap on a new line? I can't reproduce it locally.

image

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Feb 4, 2025
#[error("DataFusion error: {0}")]
#[error("DataFusion error: {}", .0.strip_backtrace())]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so it turns out that the issue was that the expected error doesn't contain the backtrace but the actual one does. I couldn't reproduce locally at first because I wasn't using the backtrace feature.

I have no idea why running with --complete didn't write the backtrace to the .slt files, I find no call to strip_backtrace and I don't know what in my PR made this necessary. But it works 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add the backtrace to the errors in general (definitely not in slt) as the backtraces would include line numbers / call stacks that would change over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. This PR doesn't change that, the backtraces are in the wrapped SchemaError and PlanError. I had to strip them away, don't know why that wasn't necessary before

Comment on lines 151 to 160
query error No function matches
query error DataFusion error: Error during planning: Execution error: User\-defined coercion failed with "Error during planning: The substr function requires 2 or 3 arguments, but got 1\."
select 1 group by substr('');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what caused this change honestly.

@comphead
Copy link
Contributor

comphead commented Feb 4, 2025

Thanks @eliaperantoni
What comes to my mind is the PR might make the output too cumbersome, and some time back I think we already faced the issue that extra attention to error can slow down the planning time. I checked bunch of engines now, like DuckDB, Trino they return errors 1 by 1 which is not necessarily correct of course

@mkarbo
Copy link

mkarbo commented Feb 4, 2025

Thanks @eliaperantoni

What comes to my mind is the PR might make the output too cumbersome, and some time back I think we already faced the issue that extra attention to error can slow down the planning time. I checked bunch of engines now, like DuckDB, Trino they return errors 1 by 1 which is not necessarily correct of course

Just adding some motivation behind this

For instance, if you want to use diagnostics to power applications such as static validators or LSP's, then it will be necessary to report all errors.

Going from 1->n will open up a new set of applications of diagnostics not easily solvable outside of this ecosystem, which I would say is very valuable to the community, if we can make it happen :)

@eliaperantoni
Copy link
Contributor Author

Thanks @comphead for your feedback. We appreciate it 🙏

What comes to my mind is the PR might make the output too cumbersome

The actual error output doesn't change at all. i.e. if you get a DataFusionError and you use any of the methods from std::error::Error it'll still work the same way, exception made for the X errors, first one is: {err} prefix for the Display implementation.

It only changes if you explicitly call .iter, which I think is a very intentional act of extracting additional information from the error. But we took care to not make it cumbersome by default and to not make breaking changes.

extra attention to error can slow down the planning time

I think that's fair criticism. What could we do to solve this in a way that doesn't complicate the code too much? I think it's also important to notice that the performance impact occurs only when there are errors, and it would be greater than without this PR only if the errors are more than 1. This is because the only performance lost is due to the planner continuing after the first error. In performance critical applications that run the same carefully crafted queries, no extra code is executed and no extra allocations are made.

@alamb
Copy link
Contributor

alamb commented Feb 5, 2025

Thanks @comphead for your feedback. We appreciate it 🙏

What comes to my mind is the PR might make the output too cumbersome

The actual error output doesn't change at all. i.e. if you get a DataFusionError and you use any of the methods from std::error::Error it'll still work the same way, exception made for the X errors, first one is: {err} prefix for the Display implementation.

Yeah, I think this makes sense -- by default show the first error message unless the application wants to show more of them

It only changes if you explicitly call .iter, which I think is a very intentional act of extracting additional information from the error. But we took care to not make it cumbersome by default and to not make breaking changes.

extra attention to error can slow down the planning time

I think that's fair criticism. What could we do to solve this in a way that doesn't complicate the code too much? I think it's also important to notice that the performance impact occurs only when there are errors, and it would be greater than without this PR only if the errors are more than 1.

some time back I think we already faced the issue that extra attention to error can slow down the planning time.

This is an excellent point @comphead and I agree with @eliaperantoni 's analysis that the additional overhead will happen when the query was going to error anyways so I don't expect this to make a large difference

What we were seeing was that when planning created Err(DataFusionError::...) during normal (non erroring queries) it was quite a bit slower

@alamb alamb changed the title Collection of errors Support returning multiple DataFusionErrors Feb 5, 2025
@alamb alamb added the api change Changes the API exposed to users of the crate label Feb 5, 2025
@alamb
Copy link
Contributor

alamb commented Feb 5, 2025

Marking as an API change as the new enum will need to be handled by downstream users

@alamb alamb changed the title Support returning multiple DataFusionErrors Add DataFusionError::Collection to return multiple DataFusionErrors Feb 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eliaperantoni -- this looks great to me -- my only concern is the wrapping of single errors with Collection. Otherwise I think it is good to go 👍

I left some other comments as well

datafusion/common/src/error.rs Show resolved Hide resolved
datafusion/common/src/error.rs Outdated Show resolved Hide resolved
Comment on lines +581 to +603
pub fn iter(&self) -> impl Iterator<Item = &DataFusionError> {
struct ErrorIterator<'a> {
queue: VecDeque<&'a DataFusionError>,
}

impl<'a> Iterator for ErrorIterator<'a> {
type Item = &'a DataFusionError;

fn next(&mut self) -> Option<Self::Item> {
loop {
let popped = self.queue.pop_front()?;
match popped {
DataFusionError::Collection(errs) => self.queue.extend(errs),
_ => return Some(popped),
}
}
}
}

let mut queue = VecDeque::new();
queue.push_back(self);
ErrorIterator { queue }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can implement the same thing more simply via

    pub fn iter(&self) -> impl Iterator<Item = &DataFusionError> {
        let mut current_err = self;
        let errors: Vec<_> = match self {
            DataFusionError::Collection(errs) => errs.iter().collect(),
            _ => vec![self],
        };
        errors.into_iter()
    }

That doesn't handle recursive Collections but I think that is ok

Copy link
Contributor Author

@eliaperantoni eliaperantoni Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't handle recursive Collections but I think that is ok

But then in a query like:

SELECT
    bad,
    bad
UNION
SELECT
    bad

You would get one DataFusionError::Collection and one DataFusionError::Plan whereas you could've gotten three DataFusionError::Plan. I think that's worth having this iter that's slightly more complicated.

datafusion/sql/src/select.rs Outdated Show resolved Hide resolved
datafusion/sql/tests/cases/collection.rs Show resolved Hide resolved
@github-actions github-actions bot removed the core Core DataFusion crate label Feb 6, 2025
@eliaperantoni eliaperantoni requested a review from alamb February 6, 2025 07:50
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eliaperantoni -- this looks great to me

fn test_iter() {
let err = DataFusionError::Collection(vec![
DataFusionError::Plan("a".to_string()),
DataFusionError::Collection(vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the recursive check

@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

I merged up from main to resolve a conflict

}
}

pub struct DataFusionErrorBuilder(Vec<DataFusionError>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a follow on PR to add docs and examples:

@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

I am really sorry -- somehow I have messed up the tests on this PR. I will monkey around with them to get them passing 🐒 🤔

@alamb alamb merged commit d5f19f3 into apache:main Feb 7, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

Thanks again @eliaperantoni

@eliaperantoni
Copy link
Contributor Author

Thanks again @eliaperantoni

Thank you @alamb! It was very kind of you to fix the test 😊

@alamb
Copy link
Contributor

alamb commented Feb 7, 2025

Thanks again @eliaperantoni

Thank you @alamb! It was very kind of you to fix the test 😊

I don't really know why it took so much finagling to be honest 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate common Related to common crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report multiple errors, not just the first one
4 participants