Synopsis • Installation • Usage • Cheat sheet • Compatibility • Why Austin • Examples • Contribute
This is the nicest profiler I’ve found for Python. It’s
cross-platform, doesn’t need me to change the code that’s being profiled, and
its output can be piped directly into flamegraph.pl
. I just used it
to pinpoint a gross misuse of SQLAlchemy at work that’s run in some code at the
end of each day, and now I can go home earlier.
-- gthm on lobste.rs
If people are looking for a profiler, Austin looks pretty
cool. Check it out!
-- Michael Kennedy on Python Bytes 180
Austin is a Python frame stack sampler for CPython written in pure C. Samples are collected by reading the CPython interpreter virtual memory space in order to retrieve information about the currently running threads along with the stack of the frames that are being executed. Hence, one can use Austin to easily make powerful statistical profilers that have minimal impact on the target application and that don't require any instrumentation.
The key features of Austin are:
- Zero instrumentation;
- Minimal impact;
- Fast and lightweight;
- Time and memory profiling;
- Built-in support for multi-process applications (e.g.
mod_wsgi
).
The simplest way to turn Austin into a full-fledged profiler is to use together with the VS Code extension or combine it with FlameGraph or Speedscope. However, Austin's simple output format can be piped into any other external or custom tool for further processing. Look, for instance, at the following Python TUI
Check out A Survey of Open-Source Python Profilers by Peter Norton for a general overview of Austin.
Keep reading for more tool ideas and examples!
💜
Austin is a free and open-source project. A lot of
effort goes into its development to ensure the best performance and that it
stays up-to-date with the latest Python releases. If you find it useful,
consider sponsoring this
project.
🙏
Austin is available from the major software repositories of the most popular platforms. Check out the latest release page for pre-compiled binaries and installation packages.
On Linux, it can be installed using autotools
or as a snap from the Snap
Store. The latter will automatically perform the
steps of the autotools
method with a single command. On distributions derived
from Debian, Austin can be installed from the official repositories with
Aptitude. Anaconda users can install Austin from Conda Forge.
On Windows, Austin can be easily installed from the command line using either Chocolatey or Scoop. Alternatively, you can download the installer from the latest release page.
On macOS, Austin can be easily installed from the command line using Homebrew. Anaconda users can install Austin from Conda Forge.
For any other platform, compiling Austin from sources is as easy as cloning the repository and running the C compiler. The Releases page has many pre-compiled binaries that are ready to be uncompressed and used.
Installing Austin using autotools
amounts to the usual ./configure
, make
and make install
finger gymnastic. The only dependency is the standard C
library. Before proceding with the steps below, make sure that the autotools
are installed on your system. Refer to your distro's documentation for details
on how to do so.
git clone --depth=1 https://github.com/P403n1x87/austin.git && cd austin
autoreconf --install
./configure
make
make install
NOTE Some Linux distributions, like Manjaro, might require the execution of
automake --add-missing
before./configure
.
Alternatively, sources can be compiled with just a C compiler (see below).
Austin can be installed on many major Linux distributions from the Snap Store with the following command
sudo snap install austin --classic
On March 30 2019 Austin was accepted into the official Debian repositories and
can therefore be installed with the apt
utility.
sudo apt-get update -y && sudo apt-get install austin -y
Austin can be installed on macOS using Homebrew:
brew install austin
To install Austin from Chocolatey, run the following command from the command line or from PowerShell
choco install austin
To upgrade run the following command from the command line or from PowerShell:
choco upgrade austin
To install Austin using Scoop, run the following command from the command line or from PowerShell
scoop install austin
To upgrade run the following command from the command line or from PowerShell:
scoop update
Anaconda users on Linux and macOS can install Austin from Conda Forge with the command
conda install -c conda-forge austin
To install Austin from sources using the GNU C compiler, without autotools
,
clone the repository with
git clone --depth=1 https://github.com/P403n1x87/austin.git
On Linux one can then use the command
gcc -O3 -Os -Wall -pthread src/*.c -o src/austin
whereas on macOS it is enough to run
gcc -O3 -Os -Wall src/*.c -o src/austin
On Windows, the -lpsapi -lntdll
switches are needed
gcc -O3 -Os -Wall -lpsapi -lntdll src/*.c -o src/austin
Add -DDEBUG
if you need a more verbose log. This is useful if you encounter a
bug with Austin and you want to report it here.
Usage: austin [OPTION...] command [ARG...]
Austin is a frame stack sampler for CPython that is used to extract profiling
data out of a running Python process (and all its children, if required) that
requires no instrumentation and has practically no impact on the tracee.
-a, --alt-format Alternative collapsed stack sample format.
-C, --children Attach to child processes.
-e, --exclude-empty Do not output samples of threads with no frame
stacks.
-f, --full Produce the full set of metrics (time +mem -mem).
-g, --gc Sample the garbage collector state.
-h, --heap=n_mb Maximum heap size to allocate to increase sampling
accuracy, in MB (default is 0).
-i, --interval=n_us Sampling interval in microseconds (default is
100). Accepted units: s, ms, us.
-m, --memory Profile memory usage.
-o, --output=FILE Specify an output file for the collected samples.
-p, --pid=PID Attach to the process with the given PID.
-P, --pipe Pipe mode. Use when piping Austin output.
-s, --sleepless Suppress idle samples to estimate CPU time.
-t, --timeout=n_ms Start up wait time in milliseconds (default is
100). Accepted units: s, ms.
-w, --where=PID Dump the stacks of all the threads within the
process with the given PID.
-x, --exposure=n_sec Sample for n_sec seconds only.
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Report bugs to <https://github.com/P403n1x87/austin/issues>.
The output is a sequence of frame stack samples, one on each line. The format is
the collapsed one that is recognised by FlameGraph so that it can be piped
straight to flamegraph.pl
for a quick visualisation, or redirected to a file
for some further processing.
By default, each line has the following structure:
P<pid>;T<tid>[;[frame]]* [metric]*
where the structure of [frame]
and the number and type of metrics on each line
depend on the mode.
In normal mode, the [frame]
part of each emitted sample has the structure
[frame] := <module>:<function>:<line number>
If you want the flame graph to show the total time spent in each function, plus
the finer detail of the time spent on each line, you can use the alternative
format by passing the -a
option. In this mode, [frame]
has the structure
[frame] := <module>:<function>;L<line number>
Each line then ends with a single [metric]
, i.e. the sampling time measured in
microseconds.
NOTE This was changed in Austin 3. In previous version, the alternative format used to be the default one.
When profiling in memory mode with the -m
or --memory
switch, the metric
value at the end of each line is the memory delta between samples, measured in
bytes. In full mode (-f
or --full
switches), each sample ends with a
comma-separated list of three values: the time delta, the idle state (1 for
idle, 0 otherwise) and the RSS memory delta (positive for memory allocations,
negative for deallocations). This way it is possible to estimate wall-clock
time, CPU time and memory pressure, all from a single run.
NOTE The reported memory allocations and deallocations are obtained by computing resident memory deltas between samples. Hence these values give an idea of how much physical memory is being requested/released.
Austin can be told to profile multi-process applications with the -C
or
--children
switch. This way Austin will look for new children of the parent
process.
Austin can sample the Python garbage collector state for application running
with Python 3.7 and later versions. If the -g
/--gc
option is passed, Austin
will append :GC:
at the end of each collected frame stack whenver the
garbage collector is in the collecting state. This gives you a measure of how
busy the Python GC is during a run.
Since Austin 3.1.0.
If you are only interested in what is currently happening inside a Python
process, you can have a quick overview printed on the terminal with the
-w/--where
option. This takes the PID of the process whose threads you want to
inspect, e.g.
sudo austin -w `pgrep -f my-running-python-app`
Below is an example of what the output looks like
This works with the -C/--children
option too. The emojis to the left indicate
whether the thread is active or sleeping and whether the process is a child or
not.
Since Austin 3.3.0.
Austin tries to keep perturbations to the tracee at a minimum. In order to do
so, the tracee is never halted. To improve sampling accuracy, Austin can
allocate a heap that is used to get large snapshots of the private VM of the
tracee that is likely to contain frame information in a single attempt. The
larger the heap is allowed the grow, the more accurate the results. The maximum
size of the heap that Austin is allowed to allocate can be controlled with the
-h/--heap
option, followed by the maximum size in bytes. By default Austin
does not allocate a heap, which is ideal on systems with limited resources. If
you think your results are not accurate, try setting this parameter.
Since Austin 3.2.0.
Changed in Austin 3.3.0: the default heap size is 0.
If you want observability into the native frame stacks, you can use the
austinp
variant of austin
which can be obtained by compiling the source
with -DAUSTINP
on Linux, or from the released binaries.
austinp
makes use of ptrace
to halt the application and grab a
snapshot of the call stack with libunwind
. If you are compiling austinp
from
sources make sure that you have the development version of the libunwind
library available on your system, for example on Ubuntu,
sudo apt install libunwind-dev binutils-dev
and compile with
gcc -O3 -Os -Wall -pthread src/*.c -DAUSTINP -lunwind-ptrace -lunwind-generic -lbfd -o src/austinp
then use as per normal. The extra -k/--kernel
option is available with
austinp
which allows sampling kernel call stacks as well.
WARNING Since
austinp
usesptrace
, the impact on the tracee is no longer minimal and it becomes higher at smaller sampling intervals. Therefore the use ofaustinp
is not recommended in production environments. For this reason, the default sampling interval foraustinp
is 10 milliseconds.
The utils
folder has the script resolve.py
that can be used to resolve the
VM addresses to source and line numbers, provided that the referenced binaries
have DWARF debug symbols. To resolve the references, assuming you have collected
the samples in mysamples.austin
, do
python3 utils/resolve.py mysamples.austin > mysamples_resolved.austin
Internally, the script uses addr2line(1)
to determine source and line number
given an address, when possible.
Whilst
austinp
comes with a stripped-down implementation ofaddr2line
, it is only used for the "where" option, as resolving symbols at runtime is expensive. This is to minimise the impact of austinp on the tracee, increase accuracy and maximise the sampling rate.
The where option is also available for the austinp
variant and will
show both native and Python frames. Highlighting helps tell frames apart. The
-k
options outputs Linux kernel frames too, as shown in this example
Austin uses syslog
on Linux and macOS, and %TEMP%\austin.log
on Windows
for log messages, so make sure to watch these to get execution details and
statistics. Bad frames are output together with the other frames. In general,
entries for bad frames will not be visible in a flame graph as all tests show
error rates below 1% on average.
All the above Austin options and arguments are summarised in a cheat sheet that you can find in the doc folder in either the SVG, PDF or PNG format
Austin supports Python 2.3-2.7 and 3.3-3.10 and has been tested on the following platforms and architectures
* | ** | *** | |
---|---|---|---|
x86_64 | ✓ | ✓ | ✓ |
i686 | ✓ | ✓ | |
arm64 | ✓ | ||
ppc64le | ✓ |
* In order to attach to an external process, Austin requires the CAP_SYS_PTRACE
capability. This means that you will have to either use sudo
when attaching
to a running Python process or grant the CAP_SYS_PTRACE capability to the Austin
binary with, e.g.
sudo setcap cap_sys_ptrace+ep `which austin`
In order for Austin to work with Docker, the --cap-add SYS_PTRACE
option needs
to be passed when starting a container.
** Depending on how Python is installed on Windows, the invocation of the
python
binary might actually happen via a proxy script or launcher (e.g.
py
). Since these are not actual Python processes, Austin will fail to profile
them. To work around this, either use a path to the actual Python executable or
add the -C
option to allow Austin to automatically discore the actual child
Python process.
*** Due to the System Integrity Protection introduced in MacOS with El
Capitan, Austin cannot profile Python processes that use an executable located
in the /bin
folder, even with sudo
. Hence, either run the interpreter from a
virtual environment or use a Python interpreter that is installed in, e.g.,
/Applications
or via alternative methods, like brew
with the default prefix
(/usr/local
), or pyenv. Even in these cases, though, the use of
sudo
is required. Austin is unlikely to work with interpreters installed using
the official installers from python.org.
NOTE Austin might work with other versions of Python on all the platforms and architectures above. So it is worth giving it a try even if your system is not listed below.
When there already are similar tools out there, it's normal to wonder why one should be interested in yet another one. So here is a list of features that currently distinguish Austin.
-
Written in pure C Austin is written in pure C code. There are no dependencies on third-party libraries with the exception of the standard C library and the API provided by the Operating System.
-
Just a sampler Austin is just a frame stack sampler. It looks into a running Python application at regular intervals of time and dumps whatever frame stack it finds. The samples can then be analysed at a later time so that Austin can sample at rates higher than other non-C alternative that also analyse the samples as they run.
-
Simple output, powerful tools Austin uses the collapsed stack format of FlameGraph that is easy to parse. You can then go and build your own tool to analyse Austin's output. You could even make a player that replays the application execution in slow motion, so that you can see what has happened in temporal order.
-
Small size Austin compiles to a single binary executable of just a bunch of KB.
-
Easy to maintain Occasionally, the Python C API changes and Austin will need to be adjusted to new releases. However, given that Austin, like CPython, is written in C, implementing the new changes is rather straight-forward.
The following flame graph has been obtained with the command
austin -i 1ms ./test.py | sed '/^#/d' | ./flamegraph.pl --countname=μs > test.svg
where the sample test.py
script has the execute permission and the following
content
#!/usr/bin/env python3
import dis
for i in range(1000):
dis.dis(dis.dis)
To profile Apache2 WSGI application, one can attach Austin to the web server with
austin -Cp `pgrep apache2 | head -n 1`
Any child processes will be automatically detected as they are created and Austin will sample them too.
It is easy to write your own extension for your favourite text editor. This, for example, is a demo of a Visual Studio Code extension that highlights the most hit lines of code straight into the editor
The Austin TUI is a text-based user interface for Austin that gives you a top-like view of what is currently running inside a Python application. It is most useful for scripts that have long-running procedures as you can see where execution is at without tracing instructions in your code. You can also save the collected data from within the TUI and feed it to Flame Graph for visualisation, or convert it to the pprof format.
If you want to give it a go you can install it using pip
with
pip install austin-tui --upgrade
and run it with
austin-tui [OPTION...] command [ARG...]
with the same command line as Austin. Please note that the austin
binary
should be available from within the PATH
environment variable in order for the
TUI to work.
The TUI is based on
python-curses
. The version included with the standard Windows installations of Python is broken so it won't work out of the box. A solution is to install the wheel of the port to Windows from this page. Wheel files can be installed directly withpip
, as described in the linked page.
Austin Web is a web application that wraps around Austin. At its core, Austin
Web is based on d3-flame-graph to display a live flame graph in the browser,
that refreshes every 3 seconds with newly collected samples. Austin Web can also
be used for remote profiling by setting the --host
and --port
options.
If you want to give it a go you can install it using pip
with
pip install austin-web --upgrade
and run it with
austin-web [OPTION...] command [ARG...]
with the same command line as Austin. This starts a simple HTTP server that
serves on localhost
by default. When no explicit port is given, Austin Web
will use an ephemeral one.
Please note that the austin
binary should be available from within the PATH
environment variable in order for Austin Web to work.
Austin output is now supported by Speedscope. However, the austin-python
library comes with format conversion tools that allow to convert the output from
Austin to the Speedscope JSON format.
If you want to give it a go you can install it using pip
with
pip install austin-python --upgrade
and run it with
austin2speedscope [-h] [--indent INDENT] [-V] input output
where input
is a file containing the output from Austin and output
is the
name of the JSON file to use to save the result of the conversion, ready to be
used on Speedscope.
Austin's format can also be converted to the Google pprof format using the
austin2pprof
utility that comes with austin-python
. If you want to give it
a go you can install it using pip
with
pip install austin-python --upgrade
and run it with
austin2pprof [-h] [-V] input output
where input
is a file containing the output from Austin and output
is the
name of the protobuf file to use to save the result of the conversion, ready to
be used with Google's pprof tools.
If you like Austin and you find it useful, there are ways for you to contribute.
If you want to help with the development, then have a look at the open issues and have a look at the contributing guidelines before you open a pull request.
You can also contribute to the development of the Austin by becoming a sponsor and/or by buying me a coffee on BMC or by chipping in a few pennies on PayPal.Me.