Skip to content

Commit adaeb88

Browse files
committed
Updated parallel principles
1 parent 98da7d5 commit adaeb88

File tree

2 files changed

+70
-3
lines changed

2 files changed

+70
-3
lines changed

hpc_lecture_notes/parallel_principles.md

+68-2
Original file line numberDiff line numberDiff line change
@@ -111,10 +111,76 @@ So why does threading exist at all in Python? Python threads make sense for I/O
111111

112112
### Numba - Parallel threading without GIL
113113

114-
Numba is a library for the Just-In-Time compilation of Python code into low-level machine code that does not require the Python interpreter. We will dive into Numba in a separate session. The beauty about Numba is that since Numba compiled functions do not require the Python interpreter, they execute without having to call into the GIL. This allows to create parallel executing threads independent of the Python interpreter, delivering optimal performance on multicore CPUs.
114+
Numba is a library for the Just-In-Time compilation of Python code into low-level machine code that does not require the Python interpreter. We will dive into Numba in a separate session. The beauty about Numba is that since Numba compiled functions do not require the Python interpreter, they execute without having to call into the GIL. This allows to create parallel executing threads independent of the Python interpreter, delivering optimal performance on multicore CPUs. A parallel version of the vector addition in Numba is given below.
115+
116+
```python
117+
import numpy as np
118+
import numba
119+
120+
n = 1000000
121+
a = np.random.randn(n)
122+
b = np.random.randn(n)
123+
124+
c = np.empty(n, dtype='float64')
125+
126+
numba.njit(parallel=True)
127+
def numba_fun(arr1, arr2, arr3):
128+
"""The thread worker."""
129+
130+
for index in numba.prange(n):
131+
arr3[index] = arr1[index] + arr2[index]
132+
133+
numba_fun(a, b, c)
134+
```
135+
136+
In the above example we tell Numba to just-in-time compile the function `numba_fun`. The function `prange` tells Numba that the corresponding for-loop can be parallelised. Numba automatically splits this for-loop into threads that work independently. Since Numba compiles the function into direct machine code that does not require the Python interpreter, the GIL does not interfere.
137+
115138

116139
### An alternative solution - Process based parallel processing
117140

118-
Python has an alternative solution for parallel execution. We discussed above that
141+
Python has an alternative solution for parallel execution. We discussed above that threading in Python is limited by the GIL. The solution is process based parallelisation. Instead of multiple threads we use multiple Python processes, each with its own GIL and memory space. The `multiprocessing` module in Python makes dealing with process based parallelisation easy. Below you find the above threading example, but using only the multiprocessing module.
119142

143+
```python
144+
import numpy as np
145+
import multiprocessing
146+
import ctypes
147+
148+
def worker(arr1, arr2, arr3, chunk):
149+
"""The thread worker."""
150+
151+
# Create Numpy arrays from the
152+
# shared multiprocessing arrays
153+
154+
arr1_np = np.frombuffer(arr1.get_obj())
155+
arr2_np = np.frombuffer(arr2.get_obj())
156+
arr3_np = np.frombuffer(arr3.get_obj())
157+
158+
for index in chunk:
159+
arr3_np[index] = arr1_np[index] + arr2_np[index]
160+
161+
nprocesses = multiprocessing.cpu_count()
162+
163+
n = 1000000
164+
165+
a = multiprocessing.Array(ctypes.c_double, n)
166+
b = multiprocessing.Array(ctypes.c_double, n)
167+
c = multiprocessing.Array(ctypes.c_double, n)
168+
169+
170+
a[:] = np.random.randn(n)
171+
b[:] = np.random.randn(n)
172+
173+
chunks = np.array_split(range(n), nprocesses)
174+
175+
all_processes = []
176+
177+
for chunk in chunks:
178+
process = multiprocessing.Process(target=worker, args=(a, b, c, chunk))
179+
all_processes.append(process)
180+
process.start()
181+
182+
for process in all_processes:
183+
process.join()
184+
```
120185

186+
This example is very similar to the threading example. The main difference is the variable initialisation. Processes do not share the same memory. The multiprocessing module can copy over variables on intialisation automatically to the different processes. However, this is inefficient for large arrays, and we cannot easily write into a large array. The solution is to create shared arrays. These are special structures that can be accessed from all processes. The `multiprocessing.Array` type serves this purpose. It is very low-level. However, we can create a view of them as a Numpy array. This is done through the `np.frombuffer` command, which creates a Numpy type array based on the shared memory.

requirements.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
jupyter-book
22
matplotlib
33
numpy
4-
ghp-import
4+
numba
5+
ghp-import

0 commit comments

Comments
 (0)