This repository was archived by the owner on Feb 7, 2023. It is now read-only.

Description
The benchmark package does (at least) two questionable things:
- In order to decide whether the performance difference is significant, it uses the Mann-Whitney U test, which is meant for ordinal scale data, instead of Student's t test, which is more appropriate for ratio scale data such as the performance measurements we're making here.
- It recompiles JavaScript code from a string on every test run in order to prevent engine optimizations. Usually we're interested in performance with engine optimizations, since this is how code tends to run in the real world. Besides that, since most of the benchmark code is probably out of reach for this compilation trick, optimization is only partly disabled, producing inconsistent optimization characteristics. Maybe this behavior can be disabled; this is worth investigating.
I can think of a couple of options (from least to most effort):
- Accept the quirks and do nothing.
- Switch to an alternative benchmark framework that doesn't have these quirks. I'm not (yet) aware of an alternative.
- Forego a convenient benchmark framework. Instead, repeat the benchmark code a fixed number of times (say, 10) for each version of Underscore and report each individual result, so that people can compute their own statistics.
- Fork the benchmark package, fix the issues, submit a PR. Use our own version regardless of whether the PR is accepted. If not accepted, publish as a separate package on NPM.