Skip to content

Commit 558ed22

Browse files
liguriolenkis
andauthoredFeb 5, 2025··
LuaJIT platform profiler documentation (#4733)
Introduce a new document on LuaJIT platform profiler * LuaJIT platform profiler is a new feature implemented in Tarantool 2.10.0. The document describes the profiler's behavior as of this and next Tarantool versions. * The document is placed in the Tooling chapter. Closes #2587 Co-authored-by: lenkis <[email protected]>
1 parent e6977b7 commit 558ed22

File tree

3 files changed

+475
-101
lines changed

3 files changed

+475
-101
lines changed
 

‎doc/tooling/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ to work with Tarantool.
1111
tcm/index
1212
interactive_console
1313
luajit_memprof
14+
luajit_sysprof
1415
luajit_getmetrics

‎doc/tooling/luajit_memprof.rst

+104-101
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,13 @@
33
LuaJIT memory profiler
44
======================
55

6-
Starting from version :doc:`2.7.1 </release/2.7.1>`, Tarantool
6+
Since version :doc:`2.7.1 </release/2.7.1>`, Tarantool
77
has a built‑in module called ``misc.memprof`` that implements a LuaJIT memory
8-
profiler (which we will just call *the profiler* in this section). The profiler provides
9-
a memory allocation report that helps analyze Lua code and find the places
10-
that put the most pressure on the Lua garbage collector (GC).
8+
profiler (further in this section we call it *the profiler* for short).
9+
The profiler provides a memory allocation report that helps analyze Lua code
10+
and find the places that put the most pressure on the Lua garbage collector (GC).
11+
12+
Inside this section:
1113

1214
.. contents::
1315
:local:
@@ -18,7 +20,7 @@ that put the most pressure on the Lua garbage collector (GC).
1820
Working with the profiler
1921
-------------------------
2022

21-
Usage of the profiler involves two steps:
23+
The profiler usage involves two steps:
2224

2325
1. :ref:`Collecting <profiler_usage_get>` a binary profile of allocations,
2426
reallocations, and deallocations in memory related to Lua
@@ -28,13 +30,13 @@ Usage of the profiler involves two steps:
2830

2931
.. _profiler_usage_get:
3032

31-
Collecting binary profile
32-
~~~~~~~~~~~~~~~~~~~~~~~~~
33+
Collecting a binary profile
34+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
3335

3436
To collect a binary profile for a particular part of the Lua code,
3537
you need to place this part between two ``misc.memprof`` functions,
3638
namely, ``misc.memprof.start()`` and ``misc.memprof.stop()``, and then execute
37-
the code under Tarantool.
39+
the code in Tarantool.
3840

3941
Below is a chunk of Lua code named ``test.lua`` to illustrate this.
4042

@@ -43,26 +45,26 @@ Below is a chunk of Lua code named ``test.lua`` to illustrate this.
4345
.. code-block:: lua
4446
:linenos:
4547
46-
-- Prevent allocations on traces.
47-
jit.off()
48-
local str, err = misc.memprof.start("memprof_new.bin")
49-
-- Lua doesn't create a new frame to call string.rep, and all allocations
50-
-- are attributed not to the append() function but to the parent scope.
51-
local function append(str, rep)
52-
return string.rep(str, rep)
53-
end
48+
-- Prevent allocations on traces.
49+
jit.off()
50+
local str, err = misc.memprof.start("memprof_new.bin")
51+
-- Lua doesn't create a new frame to call string.rep, and all allocations
52+
-- are attributed not to the append() function but to the parent scope.
53+
local function append(str, rep)
54+
return string.rep(str, rep)
55+
end
5456
55-
local t = {}
56-
for i = 1, 1e4 do
57-
-- table.insert is the built-in function and all corresponding
58-
-- allocations are reported in the scope of the main chunk.
59-
table.insert(t,
60-
append('q', i)
61-
)
62-
end
63-
local stp, err = misc.memprof.stop()
57+
local t = {}
58+
for i = 1, 1e4 do
59+
-- table.insert is the built-in function and all corresponding
60+
-- allocations are reported in the scope of the main chunk.
61+
table.insert(t,
62+
append('q', i)
63+
)
64+
end
65+
local str, err = misc.memprof.stop()
6466
65-
The Lua code for starting the profiler -- as in line 3 in the test.lua example above -- is:
67+
The Lua code for starting the profiler -- as in line 3 in the ``test.lua`` example above -- is:
6668

6769
.. code-block:: lua
6870
@@ -75,27 +77,29 @@ for example if it is not possible to open a file for writing or if the profiler
7577
``misc.memprof.start()`` returns ``nil`` as the first result,
7678
an error-message string as the second result,
7779
and a system-dependent error code number as the third result.
80+
7881
If the operation succeeds, ``misc.memprof.start()`` returns ``true``.
7982

80-
The Lua code for stopping the profiler -- as in line 18 in the test.lua example above -- is:
83+
The Lua code for stopping the profiler -- as in line 18 in the ``test.lua`` example above -- is:
8184

8285
.. code-block:: lua
8386
84-
local stp, err = misc.memprof.stop()
87+
local str, err = misc.memprof.stop()
8588
8689
If the operation fails,
87-
for example if there is an error when the file descriptor is being closed
90+
for example if there is an error when the file descriptor is being closed
8891
or if there is a failure during reporting,
8992
``misc.memprof.stop()`` returns ``nil`` as the first result,
9093
an error-message string as the second result,
9194
and a system-dependent error code number as the third result.
95+
9296
If the operation succeeds, ``misc.memprof.stop()`` returns ``true``.
9397

9498
.. _profiler_usage_generate:
9599

96100
To generate the file with memory profile in binary format
97101
(in the :ref:`test.lua code example above <profiler_usage_example01>`
98-
the file name is ``memprof_new.bin``), execute the code under Tarantool:
102+
the file name is ``memprof_new.bin``), execute the code in Tarantool:
99103

100104
.. code-block:: console
101105
@@ -115,7 +119,7 @@ a profiling report:
115119
:ref:`note about jitoff <profiler_usage_internal_jitoff>` for more details.
116120

117121
* Lines 6-8: Tail call optimization doesn't create a new call frame, so all
118-
allocations inside the function called via the ``CALLT/CALLMT`` `bytecodes <http://wiki.luajit.org/Bytecode-2.0#calls-and-vararg-handling>`_
122+
allocations inside the function called via the ``CALLT/CALLMT`` `bytecodes <https://github.com/tarantool/tarantool/wiki/LuaJIT-Bytecodes>`_
119123
are attributed to the function's caller. See also the comments preceding these lines.
120124

121125
* Lines 14-16: Usually the information about allocations inside Lua built‑ins
@@ -127,8 +131,8 @@ a profiling report:
127131

128132
.. _profiler_usage_parse:
129133

130-
Parsing binary profile and generating profiling report
131-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134+
Parsing a binary profile and generating a profiling report
135+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
132136

133137
.. _profiler_usage_parse_command:
134138

@@ -143,8 +147,11 @@ via Tarantool by using the following command
143147
144148
where ``memprof_new.bin`` is the binary profile
145149
:ref:`generated earlier <profiler_usage_generate>` by ``tarantool test.lua``.
146-
(Warning: there is a slight behavior change here, the ``tarantool -e ...``
147-
command was slightly different in Tarantool versions prior to Tarantool 2.8.1.)
150+
151+
.. note::
152+
153+
There is a slight behavior change here: the ``tarantool -e ...`` command
154+
was slightly different in Tarantool versions prior to Tarantool 2.8.1.
148155

149156
Tarantool generates a profiling report and displays it on the console before closing
150157
the session:
@@ -187,7 +194,6 @@ the session:
187194
On macOS, a report will be different for the same chunk of code because
188195
Tarantool and LuaJIT are built with the GC64 mode enabled for macOS.
189196

190-
191197
Let's examine the report structure. A report has four sections:
192198

193199
* ALLOCATIONS
@@ -203,21 +209,20 @@ An event record has the following format:
203209
.. code-block:: text
204210
205211
@<filename>:<line_number>: <number_of_events> events +<allocated> bytes -<freed> bytes
206-
207212
213+
where:
208214

209-
* <filename>a name of the file containing Lua code.
210-
* <line_number>the line number where the event is detected.
211-
* <number_of_events>a number of events for this code line.
212-
* +<allocated> bytesamount of memory allocated during all the events on this line.
213-
* -<freed> bytesamount of memory freed during all the events on this line.
215+
* ``<filename>`` -— a name of the file containing Lua code.
216+
* ``<line_number>`` -— the line number where the event is detected.
217+
* ``<number_of_events>`` —- a number of events for this code line.
218+
* ``+<allocated> bytes`` —- amount of memory allocated during all the events on this line.
219+
* ``-<freed> bytes`` —- amount of memory freed during all the events on this line.
214220

215221
The ``Overrides`` label shows what allocation has been overridden.
216222

217223
See the :ref:`test.lua chunk above <profiler_usage_example01>`
218224
with the explanation in the comments for some examples.
219225

220-
221226
.. _profiler_usage_internal_jitoff:
222227

223228
The ``INTERNAL`` label indicates that this event is caused by internal LuaJIT
@@ -256,10 +261,10 @@ structures.
256261
As for investigating the Lua code with the help of profiling reports,
257262
it is always code-dependent and there can't be hundred per cent definite
258263
recommendations in this regard. Nevertheless, you can see some of the things
259-
in the :ref:`Profiling report analysis example <profiler_analysis>` later.
264+
in the :ref:`Profiling a report analysis example <profiler_analysis>` later.
260265

261266
Also, below is the :ref:`FAQ <profiler_faq>` section with the questions that
262-
most probably can arise while using profiler.
267+
most probably can arise while using the profiler.
263268

264269
.. _profiler_faq:
265270

@@ -317,44 +322,44 @@ console for a running instance.
317322
.. code-block:: lua
318323
:linenos:
319324
320-
local fiber = require "fiber"
321-
local log = require "log"
325+
local fiber = require "fiber"
326+
local log = require "log"
322327
323-
fiber.create(function()
324-
fiber.name("memprof")
328+
fiber.create(function()
329+
fiber.name("memprof")
325330
326-
collectgarbage() -- Collect all objects already dead
327-
log.warn("start of profile")
331+
collectgarbage() -- Collect all objects already dead
332+
log.warn("start of profile")
328333
329-
local st, err = misc.memprof.start(FILENAME)
330-
if not st then
331-
log.error("failed to start profiler: %s", err)
332-
end
334+
local st, err = misc.memprof.start(FILENAME)
335+
if not st then
336+
log.error("failed to start profiler: %s", err)
337+
end
333338
334-
fiber.sleep(TIME)
339+
fiber.sleep(TIME)
335340
336-
collectgarbage()
337-
st, err = misc.memprof.stop()
341+
collectgarbage()
342+
st, err = misc.memprof.stop()
338343
339-
if not st then
340-
log.error("profiler on stop error: %s", err)
341-
end
344+
if not st then
345+
log.error("profiler on stop error: %s", err)
346+
end
342347
343-
log.warn("end of profile")
344-
end)
348+
log.warn("end of profile")
349+
end)
345350
346-
where
351+
where:
347352

348-
* ``FILENAME``—the name of the binary file where profiling events are written
349-
* ``TIME``—duration of profiling, in seconds.
353+
* ``FILENAME``—the name of the binary file where profiling events are written
354+
* ``TIME``—duration of profiling, in seconds.
350355

351356
Also, you can directly call ``misc.memprof.start()`` and ``misc.memprof.stop()``
352357
from a console.
353358

354359
.. _profiler_analysis:
355360

356-
Profiling report analysis example
357-
---------------------------------
361+
Profiling a report analysis example
362+
-----------------------------------
358363

359364
In the example below, the following Lua code named ``format_concat.lua`` is
360365
investigated with the help of the memory profiler reports.
@@ -364,39 +369,39 @@ investigated with the help of the memory profiler reports.
364369
.. code-block:: lua
365370
:linenos:
366371
367-
-- Prevent allocations on new traces.
368-
jit.off()
372+
-- Prevent allocations on new traces.
373+
jit.off()
369374
370-
local function concat(a)
371-
local nstr = a.."a"
372-
return nstr
373-
end
375+
local function concat(a)
376+
local nstr = a.."a"
377+
return nstr
378+
end
374379
375-
local function format(a)
376-
local nstr = string.format("%sa", a)
377-
return nstr
378-
end
380+
local function format(a)
381+
local nstr = string.format("%sa", a)
382+
return nstr
383+
end
379384
380-
collectgarbage()
385+
collectgarbage()
381386
382-
local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
387+
local binfile = "/tmp/memprof_"..(arg[0]):match("([^/]*).lua")..".bin"
383388
384-
local st, err = misc.memprof.start(binfile)
385-
assert(st, err)
389+
local st, err = misc.memprof.start(binfile)
390+
assert(st, err)
386391
387-
-- Payload.
388-
for i = 1, 10000 do
389-
local f = format(i)
390-
local c = concat(i)
391-
end
392-
collectgarbage()
392+
-- Payload.
393+
for i = 1, 10000 do
394+
local f = format(i)
395+
local c = concat(i)
396+
end
397+
collectgarbage()
393398
394-
local st, err = misc.memprof.stop()
395-
assert(st, err)
399+
local st, err = misc.memprof.stop()
400+
assert(st, err)
396401
397-
os.exit()
402+
os.exit()
398403
399-
When you run this code :ref:`under Tarantool <profiler_usage_generate>` and
404+
When you run this code :ref:`in Tarantool <profiler_usage_generate>` and
400405
then :ref:`parse <profiler_usage_parse_command>` the binary memory profile
401406
in /tmp/memprof_format_concat.bin,
402407
you will get the following profiling report:
@@ -423,9 +428,9 @@ you will get the following profiling report:
423428
424429
Reasonable questions regarding the report can be:
425430

426-
* Why are there no allocations related to the ``concat()`` function?
427-
* Why is the number of allocations not a round number?
428-
* Why are there about 20K allocations instead of 10K?
431+
* Why are there no allocations related to the ``concat()`` function?
432+
* Why is the number of allocations not a round number?
433+
* Why are there about 20K allocations instead of 10K?
429434

430435
First of all, LuaJIT doesn't create a new string if the string with the same
431436
payload exists (see details on `lua-users.org/wiki <http://lua-users.org/wiki/ImmutableObjects>`_).
@@ -469,8 +474,8 @@ To profile only the ``concat()`` function, comment out line 23 (which is
469474
**Q**: But what will change if JIT compilation is enabled?
470475

471476
**A**: In the :ref:`code <profiler_usage_example03>`, comment out line 2 (which is
472-
``jit.off()``) and run
473-
the profiler . Now there are only 56 allocations in the report, and all other
477+
``jit.off()``) and run the profiler.
478+
Now there are only 56 allocations in the report, and all the other
474479
allocations are JIT-related (see also the related
475480
`dev issue <https://github.com/tarantool/tarantool/issues/5679>`_):
476481

@@ -494,8 +499,6 @@ allocations are JIT-related (see also the related
494499
@format_concat.lua:0 holds 640 bytes: 4 allocs, 0 frees
495500
INTERNAL holds 360 bytes: 2 allocs, 1 frees
496501
497-
498-
499502
This happens because a trace has been compiled after 56 iterations (the default
500503
value of the ``hotloop`` compiler parameter). Then, the
501504
JIT-compiler removed the unused variable ``c`` from the trace, and, therefore,
@@ -538,7 +541,7 @@ identifier is not yet compiled via LuaJIT. So, a trace can't be recorded and
538541
the compiler doesn't perform the corresponding optimizations.
539542

540543
If we change the ``format()`` function in lines 9-12 of the
541-
:ref:`Profiling report analysis example <profiler_usage_example03>`
544+
:ref:`Profiling a report analysis example <profiler_usage_example03>`
542545
in the following way
543546

544547
.. code-block:: lua
@@ -594,4 +597,4 @@ for example
594597
595598
$ tarantool -e 'require("memprof")(arg)' - --leak-only memprof_new.bin
596599
597-
When `--leak-only`` is used, only the HEAP SUMMARY section is displayed.
600+
When ``--leak-only`` is used, only the HEAP SUMMARY section is displayed.

‎doc/tooling/luajit_sysprof.rst

+370
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
.. _luajit_sysprof:
2+
3+
LuaJIT platform profiler
4+
========================
5+
6+
The default profiling options for LuaJIT are not fine enough to
7+
get an understanding of performance. For example, ``perf`` is only
8+
able to show the host stack, so all the Lua calls are displayed as a single
9+
``pcall()``. Oppositely, the ``jit.p`` module provided with LuaJIT
10+
is not able to give any information about the host stack.
11+
12+
Since version :doc:`2.10.0 </release/2.10.0>`, Tarantool
13+
has a built‑in module called ``misc.sysprof`` that implements a
14+
LuaJIT sampling profiler (further in this section we call it *the profiler*
15+
for short). The profiler can capture both guest and
16+
host stacks simultaneously, along with virtual machine states, so
17+
it can show the whole picture.
18+
19+
Three profiling modes are available:
20+
21+
* **Default**: shows only virtual machine state counters.
22+
* **Leaf**: shows the last frame on the stack.
23+
* **Callchain**: performs a complete stack dump.
24+
25+
The profiler comes with a default parser, which produces output in
26+
a ``flamegraph.pl``-suitable format.
27+
28+
Inside this section:
29+
30+
.. contents::
31+
:local:
32+
:depth: 2
33+
34+
.. _profiler_usage:
35+
36+
Working with the profiler
37+
-------------------------
38+
39+
The profiler usage involves two steps:
40+
41+
1. :ref:`Collecting <profiler_usage_get>` a binary profile of
42+
stacks (further referred as *binary sampling profile* or *binary profile*
43+
for short).
44+
2. :ref:`Parsing <profiler_usage_parse>` the collected binary
45+
profile to get a human-readable profiling report.
46+
47+
.. _profiler_usage_get:
48+
49+
Collecting a binary profile
50+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51+
52+
To collect a binary profile for a particular part of the Lua and C code,
53+
you need to place this part between two ``misc.sysprof`` functions --
54+
namely, ``misc.sysprof.start()`` and ``misc.sysprof.stop()`` -- and
55+
then execute the code in Tarantool.
56+
57+
Below is a chunk of Lua code named ``test.lua`` to illustrate this.
58+
59+
.. _profiler_usage_example01:
60+
61+
.. code-block:: lua
62+
:linenos:
63+
64+
local function payload()
65+
local function fib(n)
66+
if n <= 1 then
67+
return n
68+
end
69+
return fib(n - 1) + fib(n - 2)
70+
end
71+
return fib(32)
72+
end
73+
74+
payload()
75+
76+
local res, err = misc.sysprof.start({mode = 'C', interval = 1, path = 'sysprof.bin'})
77+
assert(res, err)
78+
79+
payload()
80+
81+
res, err = misc.sysprof.stop()
82+
assert(res, err)
83+
84+
The Lua code for starting the profiler -- as in line 1 in the
85+
``test.lua`` example above -- is:
86+
87+
.. code-block:: lua
88+
89+
local str, err = misc.sysprof.start({mode = 'C', interval = 1, path = 'sysprof.bin'})
90+
91+
where:
92+
93+
* ``mode`` is a profiling mode;
94+
* ``interval`` is a sampling interval;
95+
* ``sysprof.bin`` is the name of the binary file where profiling events are written.
96+
97+
If the operation fails, for example if it is not possible to open
98+
a file for writing or if the profiler is already running,
99+
``misc.sysprof.start()`` returns ``nil`` as the first result,
100+
an error-message string as the second result,
101+
and a system-dependent error code number as the third result.
102+
103+
If the operation succeeds, ``misc.sysprof.start()`` returns ``true``.
104+
105+
The Lua code for stopping the profiler -- as in line 15 in the
106+
``test.lua`` example above -- is:
107+
108+
.. code-block:: lua
109+
110+
local res, err = misc.sysprof.stop()
111+
112+
If the operation fails, for example if there is an error when the
113+
file descriptor is being closed or if there is a failure during
114+
reporting, ``misc.sysprof.stop()`` returns ``nil`` as the first
115+
result, an error-message string as the second result,
116+
and a system-dependent error code number as the third result.
117+
118+
If the operation succeeds, ``misc.sysprof.stop()`` returns ``true``.
119+
120+
.. _profiler_usage_generate:
121+
122+
To generate a file with the memory profile in the binary format
123+
(in the :ref:`test.lua code example above <profiler_usage_example01>`
124+
the file name is ``sysprof.bin``), execute the code in Tarantool:
125+
126+
.. code-block:: console
127+
128+
$ tarantool test.lua
129+
130+
Tarantool collects allocation events in ``sysprof.bin``, puts
131+
the file in its :ref:`working directory <cfg_basic-work_dir>`,
132+
and closes the session.
133+
134+
.. _profiler_usage_parse:
135+
136+
Parsing a binary profile and generating a profiling report
137+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138+
139+
.. _profiler_usage_parse_command:
140+
141+
After getting the platform profile in the binary format, the next step is
142+
to parse it to get a human-readable profiling report. You can do this
143+
via Tarantool with the following command
144+
(mind the hyphen ``-`` before the filename):
145+
146+
.. code-block:: console
147+
148+
$ tarantool -e 'require("sysprof")(arg)' - sysprof.bin > tmp
149+
$ curl -O https://raw.githubusercontent.com/brendangregg/FlameGraph/refs/heads/master/flamegraph.pl
150+
$ perl flamegraph.pl tmp > sysprof.svg
151+
152+
where ``sysprof.bin`` is the binary profile
153+
:ref:`generated earlier <profiler_usage_generate>` by ``tarantool test.lua``.
154+
155+
.. note::
156+
157+
There is a slight behavior change here: the ``tarantool -e ...`` command
158+
was slightly different in Tarantool versions prior to Tarantool 2.8.1.
159+
The resulting SVG image contains a flamegraph with collected stacks
160+
and can be opened by a modern web-browser for analysis.
161+
162+
As for investigating the Lua code with the help of profiling reports,
163+
it is always code-dependent and there are no definite recommendations
164+
in this regard. Nevertheless, you can see some of the things
165+
in the :ref:`Profiling report analysis example <profiler_analysis>` below.
166+
167+
.. _profiler_lua_api:
168+
169+
Profiler Lua API
170+
----------------
171+
172+
The platform profiler provides a Lua interface:
173+
174+
* ``misc.sysprof.start(opts)``
175+
* ``misc.sysprof.stop()``
176+
* ``misc.sysprof.report()``
177+
178+
The first two functions return boolean ``res`` and ``err``, which is
179+
``nil`` on success and contains an error message on failure.
180+
181+
``misc.sysprof.report`` returns a Lua table containing the
182+
following counters:
183+
184+
.. code-block:: console
185+
186+
{
187+
"samples" = int,
188+
"INTERP" = int,
189+
"LFUNC" = int,
190+
"FFUNC" = int,
191+
"CFUNC" = int,
192+
"GC" = int,
193+
"EXIT" = int,
194+
"RECORD" = int,
195+
"OPT" = int,
196+
"ASM" = int,
197+
"TRACE" = int
198+
}
199+
200+
The ``opts`` argument of ``misc.sysprof.start`` can contain the
201+
following parameters:
202+
203+
* ``mode`` (required) -- one of the supported profiling modes:
204+
205+
* ``'D'`` = DEFAULT
206+
* ``'L'`` = LEAF
207+
* ``'C'`` = CALLGRAPH
208+
209+
* ``interval`` (optional) -- sampling interval in msec (default is 10 msec).
210+
* ``path`` (optional) -- path to a file to store profile data
211+
(default is ``sysprof.bin``).
212+
213+
.. _profiler_c_api:
214+
215+
Profiler C API
216+
--------------
217+
218+
The platform profiler provides a low-level C interface:
219+
220+
* ``int luaM_sysprof_set_writer(sp_writer writer)`` -- sets a writer function
221+
for sysprof.
222+
223+
* ``int luaM_sysprof_set_on_stop(sp_on_stop on_stop)`` -- sets
224+
an on-stop callback for ``sysprof`` to clear resources.
225+
226+
* ``int luaM_sysprof_set_backtracer(sp_backtracer backtracer)`` -- sets
227+
a backtracing function. If the ``backtracer`` argument is NULL,
228+
the default backtracer is set.
229+
230+
.. note::
231+
232+
There is no need to call the configuration functions multiple times
233+
if you are starting and stopping the profiler several times
234+
in a single program.
235+
236+
Also, it is not necessary to configure ``sysprof`` for the ``Default`` mode.
237+
However, you MUST configure it for other modes.
238+
239+
* ``int luaM_sysprof_start(lua_State *L, const struct luam_Sysprof_Options *opt)`` --
240+
see :ref:`Profiler options <profiler_c_options>`.
241+
242+
* ``int luaM_sysprof_stop(lua_State *L)``
243+
244+
* ``int luaM_sysprof_report(struct luam_Sysprof_Counters *counters)`` -- writes
245+
:ref:`profiling counters <profiler_c_counters>` for each vmstate.
246+
247+
All of the functions return 0 on success and an error code on failure.
248+
249+
.. _profiler_c_config_types:
250+
251+
Configuration C types
252+
~~~~~~~~~~~~~~~~~~~~~
253+
254+
Profiler configuration settings include:
255+
256+
* ``typedef size_t (*sp_writer)(const void **data, size_t len, void *ctx)`` --
257+
a writer function for profile events.
258+
259+
Must be async-safe, see also ``man 7 signal-safety``.
260+
261+
Should return the amount of written bytes on success, or zero in case of error.
262+
263+
Setting ``*data`` to NULL means end of profiling.
264+
For details see ``lj_wbuf.h``.
265+
266+
* ``typedef int (*sp_on_stop)(void *ctx, uint8_t *buf)`` -- a callback
267+
on profiler stopping. Required for a correct cleanup at VM finalization
268+
when the profiler is still running.
269+
270+
Returns zero on success.
271+
272+
* ``typedef void (*sp_backtracer)(void *(*frame_writer)(int frame_no, void *addr))`` --
273+
a backtracing function for the host stack.
274+
Should call ``frame_writer`` on each frame in the stack, in the order
275+
from the stack top to the stack bottom.
276+
277+
The ``frame_writer`` function is implemented inside ``sysprof``
278+
and will be passed to the ``backtracer`` function.
279+
280+
If ``frame_writer`` returns NULL, backtracing should be stopped.
281+
If ``frame_writer`` returns not NULL, the backtracing should be continued
282+
if there are frames left.
283+
284+
.. _profiler_c_options:
285+
286+
Profiler options
287+
~~~~~~~~~~~~~~~~
288+
289+
The options structure for ``luaM_sysprof_start`` is as follows:
290+
291+
.. code-block:: console
292+
293+
struct luam_Sysprof_Options {
294+
/* Profiling mode. */
295+
uint8_t mode;
296+
297+
/* Sampling interval in msec. */
298+
uint64_t interval;
299+
300+
/* Custom buffer to write data. */
301+
uint8_t *buf;
302+
303+
/* The buffer's size. */
304+
size_t len;
305+
306+
/* Context for the profile writer and final callback. */
307+
void *ctx;
308+
};
309+
310+
.. _profiler_c_modes:
311+
312+
Profiling modes
313+
~~~~~~~~~~~~~~~
314+
315+
The platform profiler supports three profiling modes:
316+
317+
* ``DEFAULT`` mode collects only data for ``luam_sysprof_counters``,
318+
which is stored in memory and can be collected with ``luaM_sysprof_report``
319+
after the profiler stops.
320+
321+
* ``LEAF`` mode = ``DEFAULT`` + streams samples with only top frames of the host
322+
and guests stacks in the format described in ``lj_sysprof.h``.
323+
324+
* ``CALLGRAPH`` mode = DEFAULT + streams samples with full callchains of the host
325+
and guest stacks in the format described in ``lj_sysprof.h``.
326+
327+
.. code-block:: console
328+
329+
#define LUAM_SYSPROF_DEFAULT 0
330+
#define LUAM_SYSPROF_LEAF 1
331+
#define LUAM_SYSPROF_CALLGRAPH 2
332+
333+
.. _profiler_c_counters:
334+
335+
Profiling counters
336+
~~~~~~~~~~~~~~~~~~
337+
338+
The counters structure for ``luaM_sysprof_report`` is as follows:
339+
340+
.. code-block:: console
341+
342+
struct luam_Sysprof_Counters {
343+
uint64_t vmst_interp;
344+
uint64_t vmst_lfunc;
345+
uint64_t vmst_ffunc;
346+
uint64_t vmst_cfunc;
347+
uint64_t vmst_gc;
348+
uint64_t vmst_exit;
349+
uint64_t vmst_record;
350+
uint64_t vmst_opt;
351+
uint64_t vmst_asm;
352+
uint64_t vmst_trace;
353+
354+
uint64_t samples;
355+
};
356+
357+
.. note::
358+
359+
The order of ``vmst_*`` counters is important: it should be the same as
360+
the order of the vmstates.
361+
362+
.. _profiler_caveats:
363+
364+
Caveats
365+
~~~~~~~
366+
367+
* Providing writers, backtracers and other settings in the ``Default`` mode
368+
is pointless, since it only collects counters.
369+
* There is NO default configuration for ``sysprof``, so ``luaM_Sysprof_Configure``
370+
must be called before the first run of ``sysprof``. Mind the async safety.

0 commit comments

Comments
 (0)
Please sign in to comment.