You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lkmpg.tex
+91
Original file line number
Diff line number
Diff line change
@@ -1554,6 +1554,97 @@ \section{System Calls}
1554
1554
1555
1555
\samplec{examples/syscall.c}
1556
1556
1557
+
Another technique we can utilize to control the flow of execution of a syscall is \verb|ftrace|.
1558
+
It is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel.
1559
+
It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space.
1560
+
It is usually used as an event tracer by attaching callbacks to the beginning of functions in order to record and trace the flow of the kernel.
1561
+
1562
+
The basic prototype of the callback function is
1563
+
\begin{verbatim}
1564
+
struct ftrace_ops {
1565
+
ftrace_func_t func; // callback function
1566
+
unsigned long flags; // ftrace flags
1567
+
void* private; // any private data
1568
+
};
1569
+
void callback_func(unsigned long ip, unsigned long parent_ip,
1570
+
struct ftrace_ops *ops, struct pt_regs *regs);
1571
+
\end{verbatim}
1572
+
1573
+
where
1574
+
1575
+
\begin{itemize}
1576
+
\item\cpp|ip|: The instruction pointer of the function being traced.
1577
+
\item\cpp|parent_ip|: The instruction pointer of the caller of the traced function.
1578
+
\item\cpp|ops|: A pointer to \cpp|ftrace_ops| that was used to register the callback.
1579
+
\item\cpp|regs|: If \cpp|FTRACE_OPS_FL_SAVE_REGS| or \cpp|FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED| are set in the \cpp|ftrace_ops| structure, then this will be pointing to the \cpp|pt_regs| structure like it would be if an breakpoint was placed at the start of the function where \verb|ftrace| was tracing for CPU register access. Otherwise it either contains garbage, or \cpp|NULL|. Do notice that in kernel version later than v5.11, this is replaced with \cpp|struct ftrace_regs *fregs|, with the original \cpp|pt_regs| accessible by \cpp|fregs->regs|.
1580
+
\end{itemize}
1581
+
1582
+
Internally, there's a 5-byte \cpp|call| to \cpp|__fentry__| at the beginning (BEFORE function prologue) of a traceable kernel function, which is converted to \cpp|nop| during boot to prevent overhead. When a trace is registered, it is changed back to \cpp|__fentry__| and the registered callback will be executed accordingly.
1583
+
1584
+
But callbacks can do more.
1585
+
Since it's called at the start of a function,
1586
+
and we have access to CPU registers,
1587
+
maybe we can ``hijack'' the traced function by modifying the instruction pointer?
1588
+
Yes, this is possible by enabling \cpp|FTRACE_OPS_FL_IPMODIFY| flag when registering a trace.
1589
+
It will allow us to modify the instruction pointer register, which will become an unconditional jump after the \verb|ftrace| function.
1590
+
Note that while there can be multiple tracers on one function, only one tracer that changes \cpp|ip| can be registered at the same time.
1591
+
1592
+
Figure~\ref{img:ftrace-hooking-example} gives an example of auditing \cpp|sys_execve| by hooking it using \verb|ftrace|.
1593
+
The callback function (\cpp|fh_ftrace_thunk|) checks whether the call is from the kernel or the module,
1594
+
and passes control accordingly.
1595
+
If the call is from the kernel, our auditing function is called.
1596
+
Otherwise, nothing happens.
1597
+
The check is important because we're only ``decorating'' the original syscall.
1598
+
Our auditing function contains call to the original \cpp|sys_execve|,
1599
+
which will trigger the callback function again.
1600
+
It'll be an infinite loop if there's no check performed.
\caption{How live kernel patching works. \href{https://ubuntu.com/blog/an-overview-of-live-kernel-patching}{Source}}
1619
+
\label{img:kernel-livepatching}
1620
+
\end{figure}
1621
+
1622
+
For more information regarding \cpp|ftrace|, check out \href{https://www.kernel.org/doc/html/latest/trace/ftrace.html}{the kernel documentation} and \href{https://youtu.be/93uE_kWWQjs}{this talk from Steven Rostedt}.
1623
+
1624
+
Before getting our hands dirty, here are some functions we need to know.
1625
+
1626
+
\begin{itemize}
1627
+
\item\cpp|register_ftrace_function(struct ftrace_ops *ops)|: Enable tracing call defined by \cpp|ops|
1628
+
\item\cpp|unregister_ftrace_function(struct ftrace_ops *ops)|: Disable tracing call defined by \cpp|ops|
1629
+
\item\cpp|ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, int len, int reset)|: Denote which function should be enabled for tracing by its name. If \cpp|buf| is \cpp|NULL|, all functions will be enabled.
1630
+
\item\cpp|ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip, int remove, int reset)|: Denote which function should be enabled for tracing by its address. \cpp|remove| should be \cpp|0| when adding a trace, and \cpp|1| when removing one. Note that \cpp|ip| must be the address where the call to \cpp|__fentry__| is located in the function.
1631
+
\end{itemize}
1632
+
1633
+
Alright let's write some code.
1634
+
Below is the source code of the example from above, but rewritten using \verb|ftrace|.
1635
+
The main difference is the \cpp|install_hook| function,
1636
+
which prepares our tracee function (\cpp|sys_openat|),
1637
+
and the callback function (\cpp|ftrace_thunk|).
1638
+
We need both \cpp|FTRACE_OPS_FL_SAVE_REGS| and \cpp|FTRACE_OPS_FL_IPMODIFY| because we're modifying \cpp|ip|.
1639
+
Inside \cpp|ftrace_thunk| is what the magic happens.
1640
+
We check if it is called from within the module,
1641
+
if not then it modifies the instruction pointer to our ``spying'' function.
1642
+
The check is performed by checking whether \cpp|parent_ip| is within this module.
1643
+
During the first call, \cpp|parent_ip| points to somewhere within the kernel,
1644
+
while during the second call it points to somewhere in our ``spying'' function, which is within the module.
0 commit comments