Skip to content

Conversation

@2010YOUY01
Copy link
Contributor

Which issue does this PR close?

An initial attempt towards #18467

Rationale for this change

Rationale for the additional lint rule clippy::needless_pass_by_value

There is a clippy lint rule that is not turned on by the current strictness level in CI: https://rust-lang.github.io/rust-clippy/master/index.html#needless_pass_by_value
Note it has the Clippy category pedantic, and its description is lints which are rather strict or have occasional false positives from https://doc.rust-lang.org/nightly/clippy

It seems we have been suffering from the excessive copying issue for quite some time, and @alamb is on the front line now #18413. I think this extra lint rule is able to help.

Implementation plan

This PR only enables this rule in datafusion-common package, and apply #[allow(clippy::needless_pass_by_value)] for all violations.
If this PR makes sense, we can open a tracking issue and roll out this check to the remaining workspace packages. At least this can help prevent new inefficient patterns and identify existing issues that we can fix gradually.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added development-process Related to development process of DataFusion common Related to common crate labels Nov 3, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- I think this looks like an improvement to me

If we are going to add this lint I think we should also update the various APIs to pass (not just add #[allow(clippy::needless_pass_by_value)]

If this PR makes sense, we can open a tracking issue and roll out this check to the remaining workspace packages. At least this can help prevent new inefficient patterns and identify existing issues that we can fix gradually.

I think it makes a lot of sense

use std::sync::Arc;

#[allow(clippy::needless_pass_by_value)]
fn create_qualified_schema(qualifier: &str, names: Vec<&str>) -> Result<DFSchema> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could change this to impl IntoIterator<Item = &str> for example

Suggested change
fn create_qualified_schema(qualifier: &str, names: Vec<&str>) -> Result<DFSchema> {
fn create_qualified_schema(qualifier: &str, names: impl IntoIterator<Item = &str>) -> Result<DFSchema> {

/// If `rehash==true` this combines the previous hash value in the buffer
/// with the new hash using `combine_hashes`
#[cfg(not(feature = "force_hash_collisions"))]
#[allow(clippy::needless_pass_by_value)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? What is clippy's alternate suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am wondering if this has found a good potential for optimization...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arg array can safely be taken by reference.

However, I believe all current call sites already pass it by moving, so they’re fine.

That said, there might be optimization opportunities elsewhere — this needless_pass_by_value warning could indicate that some callers are cloning unnecessarily just to make the rust compiler happy.

cargo clippy --all-targets --workspace --features avro,pyarrow,integration-tests,extended_tests -- -D warnings

# Update packages incrementally for stricter Clippy checks
# TODO: add tracking issue for the remaining workspace packages like `datafusion-catalog`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better to add at the module level (rather than in the CI script) so that it would also be flagged locally when people ran clippy.

Similar to this:

#![deny(clippy::clone_on_ref_ptr)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the CI script, and update the datafusion/common/cargo.toml instead in 2736b39, since we can update the whole datafusion-common crate at once.

When updating a larger crate, I think it's possible to configure like this to update module by module.

@github-actions github-actions bot added substrait Changes to the substrait crate proto Related to proto crate labels Nov 4, 2025
@github-actions github-actions bot removed substrait Changes to the substrait crate proto Related to proto crate labels Nov 4, 2025
@2010YOUY01
Copy link
Contributor Author

Thank you @2010YOUY01 -- I think this looks like an improvement to me

If we are going to add this lint I think we should also update the various APIs to pass (not just add #[allow(clippy::needless_pass_by_value)]

If this PR makes sense, we can open a tracking issue and roll out this check to the remaining workspace packages. At least this can help prevent new inefficient patterns and identify existing issues that we can fix gradually.

I think it makes a lot of sense

Thank you for the feedbacks!

For the following cases, I kept the #[allow(clippy::needless_pass_by_value)] to suppress it

  • Tests. I think this rule do seem a bit annoying for tests, and I couldn't find a way to turn it off for all test modules...however we can apply the suppressing macro for individual test modules.
  • Public APIs
  • Intentional moves

For future works, I think we could start with crates that are performance critical or currently suffering from unnecessary clones.

@xudong963
Copy link
Member

performance critical

One place is the logical optimizer I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants