-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataFusionError::Collection
to return multiple DataFusionError
s
#14439
Conversation
@alamb pr is ready for review. I'm getting this weird error that appears to be because of the TTY size being smaller in CI, making the error wrap on a new line? I can't reproduce it locally. ![]() |
#[error("DataFusion error: {0}")] | ||
#[error("DataFusion error: {}", .0.strip_backtrace())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so it turns out that the issue was that the expected error doesn't contain the backtrace but the actual one does. I couldn't reproduce locally at first because I wasn't using the backtrace
feature.
I have no idea why running with --complete
didn't write the backtrace to the .slt
files, I find no call to strip_backtrace
and I don't know what in my PR made this necessary. But it works 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add the backtrace to the errors in general (definitely not in slt
) as the backtraces would include line numbers / call stacks that would change over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. This PR doesn't change that, the backtraces are in the wrapped SchemaError and PlanError. I had to strip them away, don't know why that wasn't necessary before
query error No function matches | ||
query error DataFusion error: Error during planning: Execution error: User\-defined coercion failed with "Error during planning: The substr function requires 2 or 3 arguments, but got 1\." | ||
select 1 group by substr(''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what caused this change honestly.
Thanks @eliaperantoni |
Just adding some motivation behind this For instance, if you want to use diagnostics to power applications such as static validators or LSP's, then it will be necessary to report all errors. Going from 1->n will open up a new set of applications of diagnostics not easily solvable outside of this ecosystem, which I would say is very valuable to the community, if we can make it happen :) |
Thanks @comphead for your feedback. We appreciate it 🙏
The actual error output doesn't change at all. i.e. if you get a It only changes if you explicitly call
I think that's fair criticism. What could we do to solve this in a way that doesn't complicate the code too much? I think it's also important to notice that the performance impact occurs only when there are errors, and it would be greater than without this PR only if the errors are more than 1. This is because the only performance lost is due to the planner continuing after the first error. In performance critical applications that run the same carefully crafted queries, no extra code is executed and no extra allocations are made. |
Yeah, I think this makes sense -- by default show the first error message unless the application wants to show more of them
This is an excellent point @comphead and I agree with @eliaperantoni 's analysis that the additional overhead will happen when the query was going to error anyways so I don't expect this to make a large difference What we were seeing was that when planning created |
DataFusionError
s
Marking as an API change as the new enum will need to be handled by downstream users |
DataFusionError
s DataFusionError::Collection
to return multiple DataFusionError
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eliaperantoni -- this looks great to me -- my only concern is the wrapping of single errors with Collection. Otherwise I think it is good to go 👍
I left some other comments as well
pub fn iter(&self) -> impl Iterator<Item = &DataFusionError> { | ||
struct ErrorIterator<'a> { | ||
queue: VecDeque<&'a DataFusionError>, | ||
} | ||
|
||
impl<'a> Iterator for ErrorIterator<'a> { | ||
type Item = &'a DataFusionError; | ||
|
||
fn next(&mut self) -> Option<Self::Item> { | ||
loop { | ||
let popped = self.queue.pop_front()?; | ||
match popped { | ||
DataFusionError::Collection(errs) => self.queue.extend(errs), | ||
_ => return Some(popped), | ||
} | ||
} | ||
} | ||
} | ||
|
||
let mut queue = VecDeque::new(); | ||
queue.push_back(self); | ||
ErrorIterator { queue } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can implement the same thing more simply via
pub fn iter(&self) -> impl Iterator<Item = &DataFusionError> {
let mut current_err = self;
let errors: Vec<_> = match self {
DataFusionError::Collection(errs) => errs.iter().collect(),
_ => vec![self],
};
errors.into_iter()
}
That doesn't handle recursive Collections but I think that is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't handle recursive Collections but I think that is ok
But then in a query like:
SELECT
bad,
bad
UNION
SELECT
bad
You would get one DataFusionError::Collection
and one DataFusionError::Plan
whereas you could've gotten three DataFusionError::Plan
. I think that's worth having this iter
that's slightly more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eliaperantoni -- this looks great to me
fn test_iter() { | ||
let err = DataFusionError::Collection(vec![ | ||
DataFusionError::Plan("a".to_string()), | ||
DataFusionError::Collection(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for the recursive check
I merged up from main to resolve a conflict |
} | ||
} | ||
|
||
pub struct DataFusionErrorBuilder(Vec<DataFusionError>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a follow on PR to add docs and examples:
I am really sorry -- somehow I have messed up the tests on this PR. I will monkey around with them to get them passing 🐒 🤔 |
Thanks again @eliaperantoni |
Thank you @alamb! It was very kind of you to fix the test 😊 |
I don't really know why it took so much finagling to be honest 🤷 |
This PR adds
DataFusionError::Collection
so that multiple errors can be returned at once, shortening the feedback loop for developers because they would now be able to fix more than error in their SQL query before trying to run it again.Example:
All of these errors are returned in a single run:
For now, I just implemented this when:
SelectItem
sSetExpr
sWhich issue does this PR close?
Closes #13676.
Rationale for this change
See #13676.
What changes are included in this PR?
DataFusionError::Collection
DataFusionError::iter
SelectItem
s andSetExpr
sAre these changes tested?
Yes, see
datafusion/sql/tests/cases/collection.rs
Are there any user-facing changes?
There is one opt-in addition:
DataFusionError::iter
.