Skip to content

Asymptotic complexity testing

Toby Dylan Hocking edited this page Sep 18, 2019 · 12 revisions

Background

R package developers currently use ad-hoc tests of asymptotic computational complexity via empirical timings of functions and visual diagnostic plots. However, there is no framework in R for systematically testing the empirical computational complexity of functions. This is a problem because such a testing framework could be essential for identifying big speed gains in R code. Examples include the quadratic time algorithms that were used prior to R-3.6.0 for gregexpr/substring, https://github.com/tdhock/namedCapture-article#6-mar-2019

This is not an isolated issue. Many R packages could benefit from an asymptotic testing framework. For example rhdf5 https://github.com/grimbough/rhdf5/issues/31 has quadratic time access to some kinds of data, which should be linear time.

Related work

There are many R packages for unit tests, and there is Rperform for performance testing.

Details of your coding project

Create a new package, testComplexity, which will provide a suite of functions that developers can use to systematically and reliably test empirical asymptotic complexity of their functions.

  • functions for quantifying empirical time complexity of R expressions/functions. This should be relatively straightforward using existing packages, e.g. microbenchmark. For example a function asymptoticTimings(fun(N)) should return a data.frame DF with numeric columns N (number of data) and T (time in seconds). Optional parameters will be range of N values, and we will work to ensure that defaults enable accurate/informative data collection in a minimal amount of time.
  • functions for determining asymptotic complexity classes (linear, log-linear, quadratic, etc) based on noisy empirical data. For example for the data.frame DF described above we will implement a function asymptoticComplexityClass(DF$N, DF$T) which will return a character string e.g. “linear” if T = O(N), “quadratic” if T = O(N^2), etc.
  • testing functions, using syntax similar to testthat package. For example expect_linear_time(fun(N)) should fail with an error if fun(N) is not linear in N, i.e. O(N). We also propose to create a suite of test cases of functions/algorithms in R with known asymptotic complexity, in order to test the testComplexity package functions. Finally we propose to use testComplexity on a wide range of R packages in order to identify potential improvements in performance.

Expected impact

This project will provide a new package that will be useful for testing and improving the speed of various R packages.

Mentors

MENTORS: fill in this part. each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor). Please provide contact info for each mentor, along with qualifications.

IMPORTANT: you MUST write "EVALUATING" for one mentor, who will be required to do the three evaluations of the student during the summer. In previous years we have had issues with mentors who do not fill in evaluations, and when this happens R project is penalized (money is taken away). Therefore one mentor must take responsibility for doing the evaluations, and you must indicate that here, and your student must indicate that as well in the application. If it is not clear which mentor will be the EVALUATING mentor then your project will not be accepted. Example:

Students, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Toby Hocking [email protected] is the author of R packages X and Y.
  • Other Dev [email protected] is an expert in machine learning, and has previous GSOC experience with NAME_OF_OPEN_SOURCE_ORGANIZATION in 2015-2016.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
  • Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
  • Hard: Can the student write a package with Rd files, tests, and vignettes? If your package interfaces with non-R code, can the student write in that other language?

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally