Skip to content

Commit 6626ce8

Browse files
Complete ftrace part
1 parent 9b7be18 commit 6626ce8

File tree

4 files changed

+91
-0
lines changed

4 files changed

+91
-0
lines changed

assets/syscall/flow.jpg

178 KB
Loading
63.7 KB
Loading
131 KB
Loading

lkmpg.tex

+91
Original file line numberDiff line numberDiff line change
@@ -1554,6 +1554,97 @@ \section{System Calls}
15541554

15551555
\samplec{examples/syscall.c}
15561556

1557+
Another technique we can utilize to control the flow of execution of a syscall is \verb|ftrace|.
1558+
It is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel.
1559+
It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space.
1560+
It is usually used as an event tracer by attaching callbacks to the beginning of functions in order to record and trace the flow of the kernel.
1561+
1562+
The basic prototype of the callback function is
1563+
\begin{verbatim}
1564+
struct ftrace_ops {
1565+
ftrace_func_t func; // callback function
1566+
unsigned long flags; // ftrace flags
1567+
void* private; // any private data
1568+
};
1569+
void callback_func(unsigned long ip, unsigned long parent_ip,
1570+
struct ftrace_ops *ops, struct pt_regs *regs);
1571+
\end{verbatim}
1572+
1573+
where
1574+
1575+
\begin{itemize}
1576+
\item \cpp|ip|: The instruction pointer of the function being traced.
1577+
\item \cpp|parent_ip|: The instruction pointer of the caller of the traced function.
1578+
\item \cpp|ops|: A pointer to \cpp|ftrace_ops| that was used to register the callback.
1579+
\item \cpp|regs|: If \cpp|FTRACE_OPS_FL_SAVE_REGS| or \cpp|FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED| are set in the \cpp|ftrace_ops| structure, then this will be pointing to the \cpp|pt_regs| structure like it would be if an breakpoint was placed at the start of the function where \verb|ftrace| was tracing for CPU register access. Otherwise it either contains garbage, or \cpp|NULL|. Do notice that in kernel version later than v5.11, this is replaced with \cpp|struct ftrace_regs *fregs|, with the original \cpp|pt_regs| accessible by \cpp|fregs->regs|.
1580+
\end{itemize}
1581+
1582+
Internally, there's a 5-byte \cpp|call| to \cpp|__fentry__| at the beginning (BEFORE function prologue) of a traceable kernel function, which is converted to \cpp|nop| during boot to prevent overhead. When a trace is registered, it is changed back to \cpp|__fentry__| and the registered callback will be executed accordingly.
1583+
1584+
But callbacks can do more.
1585+
Since it's called at the start of a function,
1586+
and we have access to CPU registers,
1587+
maybe we can ``hijack'' the traced function by modifying the instruction pointer?
1588+
Yes, this is possible by enabling \cpp|FTRACE_OPS_FL_IPMODIFY| flag when registering a trace.
1589+
It will allow us to modify the instruction pointer register, which will become an unconditional jump after the \verb|ftrace| function.
1590+
Note that while there can be multiple tracers on one function, only one tracer that changes \cpp|ip| can be registered at the same time.
1591+
1592+
Figure~\ref{img:ftrace-hooking-example} gives an example of auditing \cpp|sys_execve| by hooking it using \verb|ftrace|.
1593+
The callback function (\cpp|fh_ftrace_thunk|) checks whether the call is from the kernel or the module,
1594+
and passes control accordingly.
1595+
If the call is from the kernel, our auditing function is called.
1596+
Otherwise, nothing happens.
1597+
The check is important because we're only ``decorating'' the original syscall.
1598+
Our auditing function contains call to the original \cpp|sys_execve|,
1599+
which will trigger the callback function again.
1600+
It'll be an infinite loop if there's no check performed.
1601+
1602+
\begin{figure}[h]
1603+
\centering
1604+
\includegraphics[width=\textwidth]{assets/syscall/flow.jpg}
1605+
\caption{Linux kernel hooking with ftrace \href{https://www.apriorit.com/dev-blog/546-hooking-linux-functions-2}{Source}}
1606+
\label{img:ftrace-hooking-example}
1607+
\end{figure}
1608+
1609+
In fact, this is what live kernel patches uses.
1610+
By redirecting the flow of execution,
1611+
end users can use patched functions instead of vulnerable ones without reboot, as figure~\ref{img:kernel-livepatching} shows.
1612+
1613+
\begin{figure}[h]
1614+
\centering
1615+
\includegraphics[width=\textwidth]{assets/syscall/kernel-livepatching1.png}\\
1616+
\vspace{1cm}
1617+
\includegraphics[width=\textwidth]{assets/syscall/kernel-livepatching2.png}
1618+
\caption{How live kernel patching works. \href{https://ubuntu.com/blog/an-overview-of-live-kernel-patching}{Source}}
1619+
\label{img:kernel-livepatching}
1620+
\end{figure}
1621+
1622+
For more information regarding \cpp|ftrace|, check out \href{https://www.kernel.org/doc/html/latest/trace/ftrace.html}{the kernel documentation} and \href{https://youtu.be/93uE_kWWQjs}{this talk from Steven Rostedt}.
1623+
1624+
Before getting our hands dirty, here are some functions we need to know.
1625+
1626+
\begin{itemize}
1627+
\item \cpp|register_ftrace_function(struct ftrace_ops *ops)|: Enable tracing call defined by \cpp|ops|
1628+
\item \cpp|unregister_ftrace_function(struct ftrace_ops *ops)|: Disable tracing call defined by \cpp|ops|
1629+
\item \cpp|ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, int len, int reset)|: Denote which function should be enabled for tracing by its name. If \cpp|buf| is \cpp|NULL|, all functions will be enabled.
1630+
\item \cpp|ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip, int remove, int reset)|: Denote which function should be enabled for tracing by its address. \cpp|remove| should be \cpp|0| when adding a trace, and \cpp|1| when removing one. Note that \cpp|ip| must be the address where the call to \cpp|__fentry__| is located in the function.
1631+
\end{itemize}
1632+
1633+
Alright let's write some code.
1634+
Below is the source code of the example from above, but rewritten using \verb|ftrace|.
1635+
The main difference is the \cpp|install_hook| function,
1636+
which prepares our tracee function (\cpp|sys_openat|),
1637+
and the callback function (\cpp|ftrace_thunk|).
1638+
We need both \cpp|FTRACE_OPS_FL_SAVE_REGS| and \cpp|FTRACE_OPS_FL_IPMODIFY| because we're modifying \cpp|ip|.
1639+
Inside \cpp|ftrace_thunk| is what the magic happens.
1640+
We check if it is called from within the module,
1641+
if not then it modifies the instruction pointer to our ``spying'' function.
1642+
The check is performed by checking whether \cpp|parent_ip| is within this module.
1643+
During the first call, \cpp|parent_ip| points to somewhere within the kernel,
1644+
while during the second call it points to somewhere in our ``spying'' function, which is within the module.
1645+
1646+
\samplec{examples/syscall-ftrace.c}
1647+
15571648
\section{Blocking Processes and threads}
15581649
\label{sec:blocking_process_thread}
15591650
\subsection{Sleep}

0 commit comments

Comments
 (0)