You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hpc_lecture_notes/parallel_principles.md
+68-2
Original file line number
Diff line number
Diff line change
@@ -111,10 +111,76 @@ So why does threading exist at all in Python? Python threads make sense for I/O
111
111
112
112
### Numba - Parallel threading without GIL
113
113
114
-
Numba is a library for the Just-In-Time compilation of Python code into low-level machine code that does not require the Python interpreter. We will dive into Numba in a separate session. The beauty about Numba is that since Numba compiled functions do not require the Python interpreter, they execute without having to call into the GIL. This allows to create parallel executing threads independent of the Python interpreter, delivering optimal performance on multicore CPUs.
114
+
Numba is a library for the Just-In-Time compilation of Python code into low-level machine code that does not require the Python interpreter. We will dive into Numba in a separate session. The beauty about Numba is that since Numba compiled functions do not require the Python interpreter, they execute without having to call into the GIL. This allows to create parallel executing threads independent of the Python interpreter, delivering optimal performance on multicore CPUs. A parallel version of the vector addition in Numba is given below.
115
+
116
+
```python
117
+
import numpy as np
118
+
import numba
119
+
120
+
n =1000000
121
+
a = np.random.randn(n)
122
+
b = np.random.randn(n)
123
+
124
+
c = np.empty(n, dtype='float64')
125
+
126
+
numba.njit(parallel=True)
127
+
defnumba_fun(arr1, arr2, arr3):
128
+
"""The thread worker."""
129
+
130
+
for index in numba.prange(n):
131
+
arr3[index] = arr1[index] + arr2[index]
132
+
133
+
numba_fun(a, b, c)
134
+
```
135
+
136
+
In the above example we tell Numba to just-in-time compile the function `numba_fun`. The function `prange` tells Numba that the corresponding for-loop can be parallelised. Numba automatically splits this for-loop into threads that work independently. Since Numba compiles the function into direct machine code that does not require the Python interpreter, the GIL does not interfere.
137
+
115
138
116
139
### An alternative solution - Process based parallel processing
117
140
118
-
Python has an alternative solution for parallel execution. We discussed above that
141
+
Python has an alternative solution for parallel execution. We discussed above that threading in Python is limited by the GIL. The solution is process based parallelisation. Instead of multiple threads we use multiple Python processes, each with its own GIL and memory space. The `multiprocessing` module in Python makes dealing with process based parallelisation easy. Below you find the above threading example, but using only the multiprocessing module.
119
142
143
+
```python
144
+
import numpy as np
145
+
import multiprocessing
146
+
import ctypes
147
+
148
+
defworker(arr1, arr2, arr3, chunk):
149
+
"""The thread worker."""
150
+
151
+
# Create Numpy arrays from the
152
+
# shared multiprocessing arrays
153
+
154
+
arr1_np = np.frombuffer(arr1.get_obj())
155
+
arr2_np = np.frombuffer(arr2.get_obj())
156
+
arr3_np = np.frombuffer(arr3.get_obj())
157
+
158
+
for index in chunk:
159
+
arr3_np[index] = arr1_np[index] + arr2_np[index]
160
+
161
+
nprocesses = multiprocessing.cpu_count()
162
+
163
+
n =1000000
164
+
165
+
a = multiprocessing.Array(ctypes.c_double, n)
166
+
b = multiprocessing.Array(ctypes.c_double, n)
167
+
c = multiprocessing.Array(ctypes.c_double, n)
168
+
169
+
170
+
a[:] = np.random.randn(n)
171
+
b[:] = np.random.randn(n)
172
+
173
+
chunks = np.array_split(range(n), nprocesses)
174
+
175
+
all_processes = []
176
+
177
+
for chunk in chunks:
178
+
process = multiprocessing.Process(target=worker, args=(a, b, c, chunk))
179
+
all_processes.append(process)
180
+
process.start()
181
+
182
+
for process in all_processes:
183
+
process.join()
184
+
```
120
185
186
+
This example is very similar to the threading example. The main difference is the variable initialisation. Processes do not share the same memory. The multiprocessing module can copy over variables on intialisation automatically to the different processes. However, this is inefficient for large arrays, and we cannot easily write into a large array. The solution is to create shared arrays. These are special structures that can be accessed from all processes. The `multiprocessing.Array` type serves this purpose. It is very low-level. However, we can create a view of them as a Numpy array. This is done through the `np.frombuffer` command, which creates a Numpy type array based on the shared memory.
0 commit comments