Find out if CPython GIL is slowing you down with a single script
gilstats.py
is a utility for dumping per-thread statistics for CPython GIL using eBPF (Linux only). It only requires you to provide a ProcessID(pid
) of your application to collect the data.
gilstats.py
uses eBPF technology under the hood, thus requires Linux.
The only thing prerequisite is to install bcc-tools
on your system.
You can follow the instructions here to install bcc-tools on your Linux system: https://github.com/iovisor/bcc/blob/master/INSTALL.md
An example installation for Ubuntu 16.04:
> echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list
> sudo apt-get update
> sudo apt-get install bcc-tools
And then you can run the gilstats.py
script providing a process to profile.
> sudo /usr/bin/python gilstats.py -p 19402
Attaching /lib/x86_64-linux-gnu/libpthread.so.0:^pthread_cond_timedwait$. Hit Ctrl+C to stop.
^C
*** Per-thread Results
[
{
"tid": 19437,
"ttot_secs": 7,
"ncall": 2191
},
{
"tid": 19438,
"ttot_secs": 7,
"ncall": 2191
},
{
"tid": 19436,
"ttot_secs": 7,
"ncall": 2190
},
{
"tid": 19435,
"ttot_secs": 0,
"ncall": 14
}
]
*** Total elapsed: 11.4190039635 secs
You can see the time spent in secs
for every thread probed.
tid
is the thread id. See here for more details.
ttot_secs
is the number of seconds this thread waited to acquire the GIL.
ncall
is the number of times this thread tried to acquire the GIL.
gilstats.py
will first get your Python interpreter's major version. That is because the GIL implementation differs a lot between Python 2 and 3. You can find the reason on why here: https://www.youtube.com/watch?v=Obt-vMVdM8s. After retrieving the Python major version, we use eBPF to hook on following library functions:
pthread:sem_wait # Python2
pthread:pthread_cond_timedwait # Python3
These functions are the functions that actually wait on GIL. On Python2, a GIL is a simple semaphore on Linux whereas on Python3 (3.2 and up) it is a condition variable. If we are able to track how much time a thread spent on these functions, we will be able to track how much time a thread waited to acquire the GIL. However, there is one more issue with this implementation. There might be some other code that might call these functions other than GIL. So, how to solve this? Well. Here is my idea:
- Measure every
sem_wait
/pthread_cond_timedwait
and recordcall_count
andtotal time spent
along with the first argument passed to these functions(via usingPT_REGS_PARM1
call eBPF provides). Forsem_wait
call, the first parameter will be a pointer to asem_t *
structure whereas forpthread_cond_timedwait
it will be apthread_cond_t *
pointer. - When probing finished, the GIL pointer will be the one with the maximum
call_count
. The reason this is true is because we assume for every other blocking call event GIL will be woken up. So, that means: there should be no moresem_wait
call for a semaphore other than GIL itself.
Since we are using eBPF
, the performance overhead during monitoring is minimal.
I think with a little bit of help, gilstats.py
can also be used to support other languages as well. E.g: the Ruby MRI GIL.