Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Performance/ReduceMerge cop #328

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sambostock
Copy link

@sambostock sambostock commented Jan 2, 2023

This cop detects and corrects building up a Hash using reduce and merge, which creates a new copy of the Hash so far on each iteration.

It is preferable to mutate a single Hash, successively adding entries, ideally avoiding small temporary hashes as well.

Real world example/motivation

Reading several hundred thousand key-value pairs from a CSV file to build up a Hash in memory using the following approach takes on the order of 15 minutes:

CSV.foreach("path/to/file.csv").reduce({}) { |hash, row| hash.merge(row[0] => row[1]) }

Meanwhile, building the same Hash by mutation takes under a second

CSV.foreach("path/to/file.csv").each_with_object({}) { |row, hash| hash[row[0]] = row[1] }
Benchmark
require 'tempfile'
require 'benchmark'
require 'csv'

num_rows = 100_000 # This takes a while after 100k records...

file = Tempfile.new('reduce_merge.csv')
file.write(Array.new(num_rows) { "#{_1},#{_1}\n" }.join)
file.flush

Benchmark.bm do |x|
  x.report('reduce + merge') do
    CSV.foreach(file.path).reduce({}) { |hash, row| hash.merge(row[0] => row[1]) }
  end

  x.report('each_with_object + assignment') do
    CSV.foreach(file.path).each_with_object({}) { |row, hash| hash[row[0]] = row[1] }
  end
end

#                                    user     system      total        real
# reduce + merge                15.068653  15.759753  30.828406 ( 31.584389)
# each_with_object + assignment  0.276015   0.013532   0.289547 (  0.298275)

Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

This cop detects and corrects building up a `Hash` using `reduce` and
`merge`, which creates a new copy of the `Hash` so far on each iteration.

It is preferable to mutate a single `Hash`, successively adding entries,
ideally avoiding small temporary hashes as well.
@sambostock sambostock marked this pull request as ready for review January 2, 2023 07:43
`Set#merge` mutates the receiver, like `Hash#merge!`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant