diff --git a/content/en/benchmark.md b/content/en/benchmark.md new file mode 100644 index 0000000000..3c87b11781 --- /dev/null +++ b/content/en/benchmark.md @@ -0,0 +1,333 @@ +--- +title: NumPy Benchmarks +sidebar: false +--- + +Visualization + +## Overview + +This web page aims to benchmark NumPy's performance on the widely accepted N-body problem +[2]. This work also compares NumPy with Python & C++ +and with compilers like Numba and Pythran. + +The objective of benchmarking NumPy revolves around the efficiency of the library in quasi real-life situations, +and the N-body problem suits the purpose well. +Benchmarking is performed over several iterations for different datasets to ensure the accuracy of the results. + + + + + + + + + + + + + + + + + +## About N-Body Problem + + + + + +> In physics, the n-body problem is the problem of predicting the individual motions of a group of celestial objects interacting with each other gravitationally. + +
Source: Wikipedia
+ +From the definition above, the N-body problem includes the kinematics between the different bodies, +which involve various mathematical computations. +Solving this problem has been motivated by the desire to understand the motions of the celestial bodies. +Thus it serves as a robust entity between real-world applications and the computational world. + +A brief description of computations involved in solving the N-body problem is given below, +along with the pseudo-code in the next section: + +Consider $n$ bodies of masses $m_1, m_2, m_3, ... , m_n$, +moving under the mutual [gravitational force](https://en.wikipedia.org/wiki/Gravity) of attraction +between them in an [inertial frame of reference](https://en.wikipedia.org/wiki/Inertial_frame_of_reference) of three dimensions, +such that consecutive positions and velocities of an ${ith}$ body +are denoted by ($s_{k-1}$, $s_k$) and ($v_{k-1}$, $v_k$) respectively. +According to the [Newton's law of gravity](https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation), +the gravitational force felt on the $ith$ body of mass $m_i$ +by a single body of mass $m_j$ is denoted as $F_{ij}$ +and the acceleration of the $ith$ body is represented as $a_i$. +Let $r_i$ and $r_j$ be the position vectors of two body, such that: + +\begin{equation} {r_i} = {s_{k}} - {s_{k-1}} \tag{I} \end{equation} + +\begin{equation} {r_j} = {s_{k-1}} - {s_{k}} \tag{II} \end{equation} + +The final aim is to find time taken to evaluate the total energy of each particle +in the celestial space at a given time step. +The equations involved in solving the problem are listed below: + +\begin{equation} {s_k} = {s_{k-1}} + {u\times t} + \frac{a\times t^2}{2} \tag{III} \end{equation} + +\begin{equation}{v_k} = {v_{k-1}} + {a\times t} \tag{IV} \end{equation} + +\begin{equation} {F_{ij}} = \frac{{G\times {m_i}\times {m_j}}\times \mid {r_j}-{r_i} \mid}{{\mid {r_j}-{r_i} \mid}^3} \tag{V} \end{equation} + +\begin{equation} {a_{i}} = \frac{F_{ij}}{m_{j}} \tag{VI} \end{equation} + +\begin{equation} \textrm{Self Potential Energy} = \textrm{U} = -\frac{{m_i}\times {m_j}}{r^2} \tag{VII} \end{equation} + +\begin{equation} \textrm{Kinetic Energy} = \textrm{K.E} = \frac{\sum m\times v^2}{2} \tag{VIII} \end{equation} + +\begin{equation} \textrm{Total Energy} = \textrm{Kinetic Energy} + \textrm{Self Potential Energy} \tag{IX} \end{equation} + +### Pseudo Code of Solving N-body Problem + +``` +Set time to 0, time_step to 0.001 and time_end to 10s +THEN number_of_step is 10/0.001 +FOR time is less than or equal to time_end + Calculate accelerations (a[i], for given position r[i]) + Calculate total initial energies: + Calculate kinetic energy + Calculate potential energy + FOR k less than number_of_step + Calculate positions (r[k+1]) + Swap accelerations + Calculate accelerations + Calculate velocities (v[k+1]) + Increment time + IF number_of_step % 100 is not 0 THEN + Calculate total energy + Print energy + ENDIF + END FOR +END FOR +``` + +## Dataset Description + +* Nine different text files, named as `InputX.txt`, where $X$ is number of particles in the celestial space (for this problem, number of particles are: $16, 32, 64, 128, 256, 512, 1000, 2000$ and $16000$). +* Dataset[3] consists of the masses of each particle and information about their initial positions and velocities along the three-dimensional axis. + +An example from the dataset[3] used is given below: +(for a single particle, the values are approximated up to four decimal places for readability) + +``` +# Ordered as: label, mass (grams), position_x, position_y, position_z, velocity_x, velocity_y, velocity_z +-1 0.0625 0.2148 -0.1204 -0.2661 0.7578 0.1576 -0.0715 +``` + +## Compiled Methods + +We considered accelerators like [Numba](http://numba.pydata.org/), +[Pythran](https://transonic.readthedocs.io/), and [Transonic](https://transonic.readthedocs.io/) +for benchmarking. +This decision is inspired by [Ralf Gommer's Presentation on SciPy 1.0](https://www.slideshare.net/RalfGommers/scipy-10-and-beyond-a-story-of-community-and-code) +(conference [video](https://www.youtube.com/watch?v=oHmm3mPxg6Y)). +We give brief details on a few of the accelerators below: + +### Numba + +> Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. + +
Source: Numba's Website
+ +Since Numba is a compiler focused on accelerating Python and NumPy codes, +the user API of the library supports various decorators. +It uses the industry-standard LLVM compiler library. +It aims to translate the Python functions to optimized machine code during runtime. +It supports variety of decorators like `@jit`, `@vectorize`, `@guvectorize`, `@stencil`, `@jitclass`, `@cfunc`, `@overload`. +We are using `Just-In-Time` compilation in this work. +It also supports `nopython` mode to generate fully compiled results +without the need for intermediate Python interpreter calls. +Numba's assistance to NumPy arrays and functions also makes it a good candidate for comparison. + + + +### Pythran + +> Pythran is an ahead of time compiler for a subset of the Python language, with a focus on scientific computing. + +
Source: Pythran's Website
+ +Since the focus of Pythran was on accelerating Python and NumPy codes, +its C++ API is the same as that of NumPy. +Pythran also supports [Expression Templating](https://en.wikipedia.org/wiki/Expression_templates) and [SIMD](https://en.wikipedia.org/wiki/SIMD) instructions, which are its main advantages. +It converts annotated Python modules into native Python modules, which are comparatively faster. +But both have the same kind of interface. + + + +## Source Code + +* The code is inspired by Pierre Augier's work on N-Body Problem. +* Visualization Code: here. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Algorithm & Source CodeImplementation Details
NumPyVectorized Approach, Broadcasting Method, NumPy Arrays
PythonStandard Python Approach, Using List
C++C++ Implementation, GNU C++ Compiler
NumbaJust-In-time Compilation, Non-Vectorized Approach, Using Numba at the Backend via Transonic, NumPy Arrays
PythranJust-In-Time Compilation, Non-Vectorized Approach, Pythran at the Backend via Transonic, NumPy Arrays
+ + +## Results + +Table values represent the normalized time 'time / nParticles^{2}` taken in seconds +by each algorithm to run on the given datasets for $50$ number of iterations. +The raw timing data can be downloaded from here. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Input(s) $\rightarrow$3264128256
NumPy0.4340.2430.1390.0713
Python0.8380.7830.820.697
C++0.10010.0890.0890.075
Numba0.10070.1010.1060.104
Pythran0.020.020.0190.0203
+ + + +## Environment configuration + +* **CPU Model:** Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz +* **RAM GB:** 16 +* **RAM Model:** DDR4 +* **Speed:** 3200 MT/s +* **Operating System:** Manjaro Linux 21.1.1, Pahvo +* **Library Versions:** + * Python: 3.9.6 + * NumPy: 1.20.3 + * Numba: 0.54.0 + * Pythran: 0.9.12.post1 + * Transonic: 0.4.10 + * GCC: 11.1.0 +* **Note:** The benchmarking is performed on the $4$ isolated CPU cores for accurate results. + +## Conclusion + +* NumPy is very efficient, especially for larger datasets. +It performs $3.2$ times faster than Python for input size $64$, +$5.8$ times faster for a dataset of size, $128$. +It gives more than $9.7$ times better performance than Python for input size $256$. +The performance of NumPy increases drastically as the number of particles in the datasets increases. +Thanks to the vectorized approach in NumPy. Vectorization makes the code look clean and concise to read. That resulted in better performance without any explicit looping, indexing, etc. + +* It uses pre-compiled C code, which adds up to the performance of NumPy. +The table shows that the performance of the NumPy is approaching the speed of C++ on increasing the input size. +For a dataset of size $64$, NumPy is $2.7$ times slower than C++. For the dataset of size $128$, +it reaches equivalent to the speed of C++, with a running time of $1.56$ times, faster than the time taken by C++. +NumPy outperforms C++ by $1.05$ times for input size $256$. + +**How can we accelerate NumPy?** + +NumPy aims to improve itself and to give better performance for the end-users. It performs well in most cases. +But to fill the gaps where NumPy is not so good various compiled methods +like Numba, Pythran, etc are used. They play a huge role. Presently, +we used Transonic's JIT Compilation at the backend for NumPy arrays to implement Numba & Pythran. +To be specific, we want to compare NumPy's vectorized approach with the JIT-compiled non-vectorized approach. + +* We observed Numba performs $2.4$ times faster than NumPy for input size $64$ +and $1.31$ times faster for input size $128$. +But later, NumPy outperforms Numba by $1.45$ times faster for input size $256$. +* Pythran performs $12.15$ times faster for input size $64$, +$7.31$ times better for input size $128$, and $3.51$ times faster than NumPy for input size $256$. + +We have compared the performance of NumPy with two of the most popular languages +Python and C++, and with popular compiled methods like Numba and Pythran. +We distinguished the remarkable change in the behaviour of NumPy as we increment the number of input sizes. +NumPy initially performed equivalent to the speed of Python, +but, later it changed its behavior from "python-like" nature to "compiled-like" behavior. +The running time became similar to accelerators that are to increase the performance of the code. +It achieves better performance for scientific computations +as well as for solving real-life situations. That's NumPy. +It stands explicitly well in all kinds of circumstances. + +## References + +1. [The issue for adding content on performance](https://github.com/numpy/numpy.org/issues/370) +2. Wikipedia's Article on N-Body Problem +3. Dataset used from Pierre Augier's repository diff --git a/content/en/config.yaml b/content/en/config.yaml index 874ff08e89..284f4128ef 100644 --- a/content/en/config.yaml +++ b/content/en/config.yaml @@ -75,6 +75,7 @@ params: text: NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries. - title: Performant text: The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code. + url: /benchmark - title: Easy to use text: NumPy's high level syntax makes it accessible and productive for programmers from any background or experience level. - title: Open source diff --git a/layouts/partials/keyfeatures.html b/layouts/partials/keyfeatures.html index 2a869dead7..df2032b5db 100644 --- a/layouts/partials/keyfeatures.html +++ b/layouts/partials/keyfeatures.html @@ -5,12 +5,14 @@
{{- range $features }} -
-

-

{{ .title }}
+
+

+ +

+
{{ .text | markdownify }}
-

-
+

+
{{- end }}
diff --git a/static/images/content_images/performance_benchmarking.png b/static/images/content_images/performance_benchmarking.png new file mode 100644 index 0000000000..f449786404 Binary files /dev/null and b/static/images/content_images/performance_benchmarking.png differ