From 34c64f1cff8c4e14338f1b15fd0b93420559330e Mon Sep 17 00:00:00 2001 From: Ivona Stojanovic Date: Thu, 17 Apr 2025 16:41:47 +0200 Subject: [PATCH 1/4] gh-131591: Add remote debugging attachment protocol documentation Add a developer-facing document describing the protocol used by remote_exec(pid, script) to execute Python code in a running process. This is intended to guide debugger and tool authors in reimplementing the protocol. --- Doc/howto/index.rst | 2 + Doc/howto/remote_debugging.rst | 335 +++++++++++++++++++++++++++++++++ 2 files changed, 337 insertions(+) create mode 100644 Doc/howto/remote_debugging.rst diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst index c09f92c9528ee1..f350141004c2db 100644 --- a/Doc/howto/index.rst +++ b/Doc/howto/index.rst @@ -34,6 +34,7 @@ Python Library Reference. mro.rst free-threading-python.rst free-threading-extensions.rst + remote_debugging.rst General: @@ -66,3 +67,4 @@ Debugging and profiling: * :ref:`gdb` * :ref:`instrumentation` * :ref:`perf_profiling` +* :ref:`remote-debugging` diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst new file mode 100644 index 00000000000000..0fd95e1210d844 --- /dev/null +++ b/Doc/howto/remote_debugging.rst @@ -0,0 +1,335 @@ +.. _remote-debugging: + +Remote Debugging Attachment Protocol +==================================== + +This section explains the low-level protocol that allows external code to inject and execute +a Python script inside a running CPython process. + +This is the mechanism implemented by the :func:`sys.remote_exec` function, which +instructs a remote Python process to execute a ``.py`` file. This section is not about using that +function, instead, it explains how the underlying protocol works so that it can be +reimplemented in any language. + +The protocol assumes you already know the process you want to target and the code you want it to run. +That’s why it takes two pieces of information: + +- The process ID (``pid``) of the Python process you want to interact with. +- A path to a Python script file (``.py``) that contains the code to be executed. + +Once injected, the script is executed by the target process’s interpreter the next time it reaches +a safe evaluation point. This allows tools to trigger +code execution remotely without modifying the Python program itself. + +In the sections that follow, we’ll walk through each step of this protocol in detail: how to locate +the interpreter in memory, how to access internal structures safely, and how to trigger the execution +of your script. Where necessary, we’ll highlight differences across platforms (Linux, macOS, Windows), +and include example code to help clarify each part of the process. + +Locating the PyRuntime Structure +================================ + +The ``PyRuntime`` structure holds CPython's global interpreter state and serves as +the entry point to other internal data, including the list of interpreters, +thread states, and debugger support fields. + +To interact with a remote Python process, a debugger must first compute the memory +address of the ``PyRuntime`` structure inside the target process. This cannot be +hardcoded or inferred symbolically, since its location depends on how the binary was +mapped into memory by the operating system. + +The process for locating ``PyRuntime`` is platform-specific, but follows the same +high-level approach: + +1. Identify where the Python executable or shared library was loaded in the target process. +2. Parse the corresponding binary file on disk to find the offset of the + ``.PyRuntime`` section. +3. Compute the in-memory address of ``PyRuntime`` by relocating the section offset + to the base address found in step 1. + +Each subsection below explains what must be done and provides a short example of how this +can be implemented. + +.. rubric:: Linux (ELF) + +To locate the ``PyRuntime`` structure on Linux: + +1. Inspect the memory mappings of the target process (e.g. from ``/proc//maps``) + to find the memory region where the Python executable or shared ``libpython`` + library is loaded. Record its base address. +2. Load the binary file from disk and parse its ELF section headers. + Locate the ``.PyRuntime`` section and determine its file offset. +3. Add the section offset to the base address to compute the address of the + ``PyRuntime`` structure in memory. + +An example implementation might look like: + +.. code-block:: python + + def find_py_runtime_linux(pid): + # Step 1: Try to find the Python executable in memory + binary_path, base_address = find_mapped_binary(pid, name_contains="python") + # Step 2: Fallback to shared library if executable is not found + if binary_path is None: + binary_path, base_address = find_mapped_binary(pid, name_contains="libpython") + # Step 3: Parse ELF headers of the binary to get .PyRuntime section offset + section_offset = parse_elf_section_offset(binary_path, ".PyRuntime") + # Step 4: Compute PyRuntime address in memory + return base_address + section_offset + +.. rubric:: macOS (Mach-O) + +To locate the ``PyRuntime`` structure on macOS: + +1. Obtain a handle to the target process that allows memory inspection. +2. Walk the memory regions of the process to identify the one that contains the + Python binary or shared library. Record its base address and associated file path. +3. Load that binary file from disk and parse the Mach-O headers to find the + ``__DATA,__PyRuntime`` section. +4. Add the section's offset to the base address of the loaded binary to compute + the address of the ``PyRuntime`` structure. + +An example implementation might look like: + +.. code-block:: python + + def find_py_runtime_macos(pid): + # Step 1: Get access to the process's memory + handle = get_memory_access_handle(pid) + # Step 2: Try to find the Python executable in memory + binary_path, base_address = find_mapped_binary(handle, name_contains="python") + # Step 3: Fallback to libpython if executable is not found + if binary_path is None: + binary_path, base_address = find_mapped_binary(handle, name_contains="libpython") + # Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offset + section_offset = parse_macho_section_offset(binary_path, "__DATA", "__PyRuntime") + # Step 5: Compute PyRuntime address in memory + return base_address + section_offset + +.. rubric:: Windows (PE) + +To locate the ``PyRuntime`` structure on Windows: + +1. Enumerate all modules loaded in the target process. + Identify the module corresponding to ``python.exe`` or ``pythonXY.dll``, where X and Y + are the major and minor version numbers of the Python version, and record its base address. +2. Load the binary from disk and parse the PE section headers. + Locate the ``.PyRuntime`` section and determine its relative virtual address (RVA). +3. Add the RVA to the module’s base address to compute the full in-memory address + of the ``PyRuntime`` structure. + +An example implementation might look like: + +.. code-block:: python + + def find_py_runtime_windows(pid): + # Step 1: Try to find the Python executable in memory + binary_path, base_address = find_loaded_module(pid, name_contains="python") + # Step 2: Fallback to shared pythonXY.dll if executable is not found + if binary_path is None: + binary_path, base_address = find_loaded_module(pid, name_contains="python3") + # Step 3: Parse PE section headers to get .PyRuntime RVA + section_rva = parse_pe_section_offset(binary_path, ".PyRuntime") + # Step 4: Compute PyRuntime address in memory + return base_address + section_rva + +Reading _Py_DebugOffsets +========================= + +Once the address of the ``PyRuntime`` structure has been computed in the target +process, the next step is to read the ``_Py_DebugOffsets`` structure located at +its beginning. + +This structure contains version-specific field offsets needed to navigate +interpreter and thread state memory safely. + +To read and validate the debug offsets: + +1. Read the memory at the address of ``PyRuntime``, up to the size of + ``_Py_DebugOffsets``. This structure is located at the very start of the + ``PyRuntime`` block. + +2. Verify that the contents of the structure are valid. In particular: + + - The ``cookie`` field must match the expected debug marker. + - The ``version`` field must match the version of the Python interpreter + used by the calling process (i.e., the debugger or controlling runtime). + - If either the caller or the target process is running a pre-release version + (such as an alpha, beta, or release candidate), then the versions must match + exactly. + - The ``free_threaded`` flag must match between the caller and the target process. + +3. If the structure passes validation, the debugger may now safely use the + provided offsets to locate fields in interpreter and thread state structures. + +If any validation step fails, the debugger should abort rather than attempting to +access incompatible memory layouts. + +An example of how a debugger might read and validate ``_Py_DebugOffsets``: + +.. code-block:: python + + def read_debug_offsets(pid, py_runtime_addr): + # Step 1: Read memory from the target process at the PyRuntime address + data = read_process_memory(pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE) + # Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structure + debug_offsets = parse_debug_offsets(data) + # Step 3: Validate compatibility + if debug_offsets.cookie != EXPECTED_COOKIE: + raise RuntimeError("Invalid or missing debug cookie") + if debug_offsets.version != LOCAL_PYTHON_VERSION: + raise RuntimeError("Mismatch between caller and target Python versions") + if debug_offsets.free_threaded != LOCAL_FREE_THREADED: + raise RuntimeError("Mismatch in free-threaded configuration") + return debug_offsets + +Locating the Interpreter and Thread State +========================================= + +After validating the ``_Py_DebugOffsets`` structure, the next step is to locate the +interpreter and thread state objects within the target process. These structures +hold essential runtime context and are required for writing debugger control +information. + +- The ``PyInterpreterState`` structure represents a Python interpreter instance. + Each interpreter holds its own module imports, built-in state, and thread list. + Most applications use only one interpreter, but CPython supports creating multiple + interpreters in the same process. + +- The ``PyThreadState`` structure represents a thread running within an interpreter. + This is where evaluation state and the control fields used by the debugger live. + +To inject and run code remotely, the debugger must locate a valid ``PyThreadState`` +to target. Typically, this is the main thread, but in some cases, the debugger may +want to attach to a specific thread by its native thread ID. + +To locate a thread: + +1. Use the offset ``runtime_state.interpreters_head`` to find the address of the + first interpreter in the ``PyRuntime`` structure. This is the entry point to + the list of active interpreters. + +2. Use the offset ``interpreter_state.threads_main`` to locate the main thread + of that interpreter. This is the simplest and most reliable thread to target. + +3. Optionally, use ``interpreter_state.threads_head`` to walk the linked list of + all threads. For each ``PyThreadState``, compare the ``native_thread_id`` + field (using ``thread_state.native_thread_id``) to find a specific thread. + + This is useful when the debugger allows the user to select which thread to inject into, + or when targeting a thread that's actively running. + +4. Once a valid ``PyThreadState`` is found, record its address. This will be used + in the next step to write debugger control fields and schedule execution. + +An example of locating the main thread: + +.. code-block:: python + + def find_main_thread_state(pid, py_runtime_addr, debug_offsets): + # Step 1: Read interpreters_head from PyRuntime + interp_head_ptr = py_runtime_addr + debug_offsets.runtime_state.interpreters_head + interp_addr = read_pointer(pid, interp_head_ptr) + if interp_addr == 0: + raise RuntimeError("No interpreter found in the target process") + # Step 2: Read the threads_main pointer from the interpreter + threads_main_ptr = interp_addr + debug_offsets.interpreter_state.threads_main + thread_state_addr = read_pointer(pid, threads_main_ptr) + if thread_state_addr == 0: + raise RuntimeError("Main thread state is not available") + return thread_state_addr + +To locate a specific thread by native thread ID: + +.. code-block:: python + + def find_thread_by_id(pid, interp_addr, debug_offsets, target_tid): + # Start at threads_head and walk the linked list + thread_ptr = read_pointer( + pid, interp_addr + debug_offsets.interpreter_state.threads_head + ) + while thread_ptr: + native_tid_ptr = thread_ptr + debug_offsets.thread_state.native_thread_id + native_tid = read_int(pid, native_tid_ptr) + if native_tid == target_tid: + return thread_ptr + thread_ptr = read_pointer(pid, thread_ptr + debug_offsets.thread_state.next) + raise RuntimeError("Thread with the given ID was not found") + +Once a valid thread state has been identified, the debugger can use it to modify +control fields and request execution in the next stage of the protocol. + +Writing Control Information +=========================== + +Once a valid thread state has been located, the debugger can write control fields +that instruct the target process to execute a script at the next safe opportunity. + +Each thread state contains a ``_PyRemoteDebuggerSupport`` structure, which is used +to coordinate communication between the debugger and the interpreter. The debugger +uses offsets from ``_Py_DebugOffsets`` to locate three key fields: + +- ``debugger_script_path``: A buffer where the debugger writes the full path to + a Python source file (``.py``). The file must exist and be readable by the + target process. + +- ``debugger_pending_call``: An integer flag. When set to ``1``, it signals + that a script is ready to be executed. + +- ``eval_breaker``: A field checked periodically by the evaluation loop. To + notify the interpreter of pending debugger activity, the debugger sets the + ``_PY_EVAL_PLEASE_STOP_BIT`` in this field. This causes the interpreter to pause + and check for debugger-related actions before continuing with normal execution. + +To safely modify these fields, most debuggers should suspend the process before +writing to memory. This avoids race conditions that may occur if the interpreter +is actively running. + +To perform the injection: + +1. Write the script path into the ``debugger_script_path`` buffer. +2. Set the ``debugger_pending_call`` flag to ``1``. +3. Read the value of ``eval_breaker``, set the stop bit, and write the updated + value back. + +An example implementation might look like: + +.. code-block:: python + + def inject_script(pid, thread_state_addr, debug_offsets, script_path): + # Base offset to the _PyRemoteDebuggerSupport struct + support_base = ( + thread_state_addr + + debug_offsets.debugger_support.remote_debugger_support + ) + # 1. Write script path + script_path_ptr = support_base + debug_offsets.debugger_support.debugger_script_path + write_string(pid, script_path_ptr, script_path) + # 2. Set debugger_pending_call = 1 + pending_ptr = support_base + debug_offsets.debugger_support.debugger_pending_call + write_int(pid, pending_ptr, 1) + # 3. Set _PY_EVAL_PLEASE_STOP_BIT in eval_breaker + eval_breaker_ptr = thread_state_addr + debug_offsets.debugger_support.eval_breaker + breaker = read_int(pid, eval_breaker_ptr) + # Set the least significant bit (this is _PY_EVAL_PLEASE_STOP_BIT) + breaker |= 1 + write_int(pid, eval_breaker_ptr, breaker) + +After these writes are complete, the debugger may resume the process (if it was paused). +The interpreter will check ``eval_breaker`` at the next evaluation checkpoint, +detect the pending call, and load and execute the specified Python file. The debugger is responsible +for ensuring that the file remains on disk and readable by the target interpreter +when it is accessed. + +Summary +======= + +To inject and execute a script in a remote Python process: + +1. Locate the ``PyRuntime`` structure in the target process's memory. +2. Read and validate the ``_Py_DebugOffsets`` structure at the start of ``PyRuntime``. +3. Use the offsets to locate a valid ``PyThreadState``. +4. Write the path to a Python script into ``debugger_script_path``. +5. Set ``debugger_pending_call = 1``. +6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in ``eval_breaker``. +7. Resume the process (if paused). The script will be executed at the next safe eval point. From bb9d85d63841f5ffb976bb7a459fd3e997f86e04 Mon Sep 17 00:00:00 2001 From: Ivona Stojanovic Date: Sun, 20 Apr 2025 12:55:25 +0200 Subject: [PATCH 2/4] fixup! gh-131591: Add remote debugging attachment protocol documentation --- Doc/howto/remote_debugging.rst | 562 ++++++++++++++++++++------------- 1 file changed, 351 insertions(+), 211 deletions(-) diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst index 0fd95e1210d844..2b776c6f823aa2 100644 --- a/Doc/howto/remote_debugging.rst +++ b/Doc/howto/remote_debugging.rst @@ -1,335 +1,475 @@ .. _remote-debugging: -Remote Debugging Attachment Protocol +Remote debugging attachment protocol ==================================== -This section explains the low-level protocol that allows external code to inject and execute -a Python script inside a running CPython process. +This section describes the low-level protocol that enables external tools to +inject and execute a Python script within a running CPython process. -This is the mechanism implemented by the :func:`sys.remote_exec` function, which -instructs a remote Python process to execute a ``.py`` file. This section is not about using that -function, instead, it explains how the underlying protocol works so that it can be -reimplemented in any language. +This mechanism forms the basis of the :func:`sys.remote_exec` function, which +instructs a remote Python process to execute a ``.py`` file. However, this +section does not document the usage of that function. Instead, it provides a +detailed explanation of the underlying protocol, which takes as input the +``pid`` of a target Python process and the path to a Python source file to be +executed. This information supports independent reimplementation of the +protocol, regardless of programming language. -The protocol assumes you already know the process you want to target and the code you want it to run. -That’s why it takes two pieces of information: -- The process ID (``pid``) of the Python process you want to interact with. -- A path to a Python script file (``.py``) that contains the code to be executed. +.. warning:: -Once injected, the script is executed by the target process’s interpreter the next time it reaches -a safe evaluation point. This allows tools to trigger -code execution remotely without modifying the Python program itself. + The execution of the injected script depends on the interpreter reaching a + safe evaluation point. As a result, execution may be delayed depending on + the runtime state of the target process. -In the sections that follow, we’ll walk through each step of this protocol in detail: how to locate -the interpreter in memory, how to access internal structures safely, and how to trigger the execution -of your script. Where necessary, we’ll highlight differences across platforms (Linux, macOS, Windows), -and include example code to help clarify each part of the process. +Once injected, the script is executed by the interpreter within the target +process the next time a safe evaluation point is reached. This approach enables +remote execution capabilities without modifying the behavior or structure of +the running Python application. -Locating the PyRuntime Structure +Subsequent sections provide a step-by-step description of the protocol, +including techniques for locating interpreter structures in memory, safely +accessing internal fields, and triggering code execution. Platform-specific +variations are noted where applicable, and example implementations are included +to clarify each operation. + +Locating the PyRuntime structure ================================ -The ``PyRuntime`` structure holds CPython's global interpreter state and serves as -the entry point to other internal data, including the list of interpreters, +CPython places the ``PyRuntime`` structure in a dedicated binary section to +help external tools find it at runtime. The name and format of this section +vary by platform. For example, ``.PyRuntime`` is used on ELF systems, and +``__DATA,__PyRuntime`` is used on macOS. Tools can find the offset of this +structure by examining the binary on disk. + +The ``PyRuntime`` structure contains CPython’s global interpreter state and +provides access to other internal data, including the list of interpreters, thread states, and debugger support fields. -To interact with a remote Python process, a debugger must first compute the memory -address of the ``PyRuntime`` structure inside the target process. This cannot be -hardcoded or inferred symbolically, since its location depends on how the binary was -mapped into memory by the operating system. +To work with a remote Python process, a debugger must first find the memory +address of the ``PyRuntime`` structure in the target process. This address +can’t be hardcoded or calculated from a symbol name, because it depends on +where the operating system loaded the binary. -The process for locating ``PyRuntime`` is platform-specific, but follows the same -high-level approach: +The method for finding ``PyRuntime`` depends on the platform, but the steps are +the same in general: -1. Identify where the Python executable or shared library was loaded in the target process. -2. Parse the corresponding binary file on disk to find the offset of the - ``.PyRuntime`` section. -3. Compute the in-memory address of ``PyRuntime`` by relocating the section offset - to the base address found in step 1. +1. Find the base address where the Python binary or shared library was loaded + in the target process. +2. Use the on-disk binary to locate the offset of the ``.PyRuntime`` section. +3. Add the section offset to the base address to compute the address in memory. -Each subsection below explains what must be done and provides a short example of how this -can be implemented. +The sections below explain how to do this on each supported platform and +include example code. .. rubric:: Linux (ELF) -To locate the ``PyRuntime`` structure on Linux: +To find the ``PyRuntime`` structure on Linux: -1. Inspect the memory mappings of the target process (e.g. from ``/proc//maps``) - to find the memory region where the Python executable or shared ``libpython`` - library is loaded. Record its base address. -2. Load the binary file from disk and parse its ELF section headers. - Locate the ``.PyRuntime`` section and determine its file offset. -3. Add the section offset to the base address to compute the address of the - ``PyRuntime`` structure in memory. - -An example implementation might look like: +1. Read the process’s memory map (for example, ``/proc//maps``) to find + the address where the Python executable or ``libpython`` was loaded. +2. Parse the ELF section headers in the binary to get the offset of the + ``.PyRuntime`` section. +3. Add that offset to the base address from step 1 to get the memory address of + ``PyRuntime``. -.. code-block:: python +The following is an example implementation:: - def find_py_runtime_linux(pid): + def find_py_runtime_linux(pid: int) -> int: # Step 1: Try to find the Python executable in memory - binary_path, base_address = find_mapped_binary(pid, name_contains="python") + binary_path, base_address = find_mapped_binary( + pid, name_contains="python" + ) + # Step 2: Fallback to shared library if executable is not found if binary_path is None: - binary_path, base_address = find_mapped_binary(pid, name_contains="libpython") - # Step 3: Parse ELF headers of the binary to get .PyRuntime section offset - section_offset = parse_elf_section_offset(binary_path, ".PyRuntime") + binary_path, base_address = find_mapped_binary( + pid, name_contains="libpython" + ) + + # Step 3: Parse ELF headers to get .PyRuntime section offset + section_offset = parse_elf_section_offset( + binary_path, ".PyRuntime" + ) + # Step 4: Compute PyRuntime address in memory return base_address + section_offset -.. rubric:: macOS (Mach-O) -To locate the ``PyRuntime`` structure on macOS: +.. rubric:: macOS (Mach-O) -1. Obtain a handle to the target process that allows memory inspection. -2. Walk the memory regions of the process to identify the one that contains the - Python binary or shared library. Record its base address and associated file path. -3. Load that binary file from disk and parse the Mach-O headers to find the - ``__DATA,__PyRuntime`` section. -4. Add the section's offset to the base address of the loaded binary to compute - the address of the ``PyRuntime`` structure. +To find the ``PyRuntime`` structure on macOS: -An example implementation might look like: +1. Call ``task_for_pid()`` to get the ``mach_port_t`` task port for the target + process. This handle is needed to read memory using APIs like + ``mach_vm_read_overwrite`` and ``mach_vm_region``. +2. Scan the memory regions to find the one containing the Python executable or + ``libpython``. +3. Load the binary file from disk and parse the Mach-O headers to find the + section named ``PyRuntime`` in the ``__DATA`` segment. On macOS, symbol + names are automatically prefixed with an underscore, so the ``PyRuntime`` + symbol appears as ``_PyRuntime`` in the symbol table, but the section name + is not affected. -.. code-block:: python +The following is an example implementation:: - def find_py_runtime_macos(pid): + def find_py_runtime_macos(pid: int) -> int: # Step 1: Get access to the process's memory handle = get_memory_access_handle(pid) + # Step 2: Try to find the Python executable in memory - binary_path, base_address = find_mapped_binary(handle, name_contains="python") - # Step 3: Fallback to libpython if executable is not found + binary_path, base_address = find_mapped_binary( + handle, name_contains="python" + ) + + # Step 3: Fallback to libpython if the executable is not found if binary_path is None: - binary_path, base_address = find_mapped_binary(handle, name_contains="libpython") + binary_path, base_address = find_mapped_binary( + handle, name_contains="libpython" + ) + # Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offset - section_offset = parse_macho_section_offset(binary_path, "__DATA", "__PyRuntime") - # Step 5: Compute PyRuntime address in memory + section_offset = parse_macho_section_offset( + binary_path, "__DATA", "__PyRuntime" + ) + + # Step 5: Compute the PyRuntime address in memory return base_address + section_offset -.. rubric:: Windows (PE) -To locate the ``PyRuntime`` structure on Windows: +.. rubric:: Windows (PE) -1. Enumerate all modules loaded in the target process. - Identify the module corresponding to ``python.exe`` or ``pythonXY.dll``, where X and Y - are the major and minor version numbers of the Python version, and record its base address. -2. Load the binary from disk and parse the PE section headers. - Locate the ``.PyRuntime`` section and determine its relative virtual address (RVA). -3. Add the RVA to the module’s base address to compute the full in-memory address - of the ``PyRuntime`` structure. +To find the ``PyRuntime`` structure on Windows: + +1. Use the ToolHelp API to enumerate all modules loaded in the target process. + This is done using functions such as `CreateToolhelp32Snapshot + `_, + `Module32First + `_, + and `Module32Next + `_. +2. Identify the module corresponding to :file:`python.exe` or + :file:`python{XY}.dll`, where ``X`` and ``Y`` are the major and minor + version numbers of the Python version (for example, ``python311.dll``), and + record its base address. +3. Locate the ``PyRuntim`` section. Section names in the PE format are limited + to 8 characters. +4. Retrieve the section’s relative virtual address (RVA) and add it to the base + address of the module. + +The following is an example implementation:: + + def find_py_runtime_windows(pid: int) -> int: + # Step 1: Try to find the Python executable in memory + binary_path, base_address = find_loaded_module( + pid, name_contains="python" + ) -An example implementation might look like: + # Step 2: Fallback to shared pythonXY.dll if the executable is not + # found + if binary_path is None: + binary_path, base_address = find_loaded_module( + pid, name_contains="python3" + ) -.. code-block:: python + # Step 3: Parse PE section headers to get PyRuntim RVA + section_rva = parse_pe_section_offset(binary_path, "PyRuntim") - def find_py_runtime_windows(pid): - # Step 1: Try to find the Python executable in memory - binary_path, base_address = find_loaded_module(pid, name_contains="python") - # Step 2: Fallback to shared pythonXY.dll if executable is not found - if binary_path is None: - binary_path, base_address = find_loaded_module(pid, name_contains="python3") - # Step 3: Parse PE section headers to get .PyRuntime RVA - section_rva = parse_pe_section_offset(binary_path, ".PyRuntime") # Step 4: Compute PyRuntime address in memory return base_address + section_rva -Reading _Py_DebugOffsets + + +RReading _Py_DebugOffsets ========================= -Once the address of the ``PyRuntime`` structure has been computed in the target -process, the next step is to read the ``_Py_DebugOffsets`` structure located at -its beginning. +Once the address of the ``PyRuntime`` structure has been determined, the next +step is to read the ``_Py_DebugOffsets`` structure located at the beginning of +the ``PyRuntime`` block. -This structure contains version-specific field offsets needed to navigate -interpreter and thread state memory safely. +This structure provides version-specific field offsets that are needed to +safely read interpreter and thread state memory. These offsets vary between +CPython versions and must be checked before use to ensure they are compatible. -To read and validate the debug offsets: +To read and check the debug offsets, follow these steps: -1. Read the memory at the address of ``PyRuntime``, up to the size of - ``_Py_DebugOffsets``. This structure is located at the very start of the - ``PyRuntime`` block. +1. Read memory from the target process starting at the ``PyRuntime`` address, + covering the same number of bytes as the ``_Py_DebugOffsets`` structure. + This structure is located at the very start of the ``PyRuntime`` memory + block. Its layout is defined in CPython’s internal headers and stays the + same within a given minor version, but may change in major versions. -2. Verify that the contents of the structure are valid. In particular: +2. Check that the structure contains valid data: - The ``cookie`` field must match the expected debug marker. - The ``version`` field must match the version of the Python interpreter - used by the calling process (i.e., the debugger or controlling runtime). - - If either the caller or the target process is running a pre-release version - (such as an alpha, beta, or release candidate), then the versions must match - exactly. - - The ``free_threaded`` flag must match between the caller and the target process. - -3. If the structure passes validation, the debugger may now safely use the - provided offsets to locate fields in interpreter and thread state structures. + used by the debugger. + - If either the debugger or the target process is using a pre-release + version (for example, an alpha, beta, or release candidate), the versions + must match exactly. + - The ``free_threaded`` field must have the same value in both the debugger + and the target process. -If any validation step fails, the debugger should abort rather than attempting to -access incompatible memory layouts. +3. If the structure is valid, the offsets it contains can be used to locate + fields in memory. If any check fails, the debugger should stop the operation + to avoid reading memory in the wrong format. -An example of how a debugger might read and validate ``_Py_DebugOffsets``: +The following is an example implementation that reads and checks +``_Py_DebugOffsets``:: -.. code-block:: python - - def read_debug_offsets(pid, py_runtime_addr): + def read_debug_offsets(pid: int, py_runtime_addr: int) -> DebugOffsets: # Step 1: Read memory from the target process at the PyRuntime address - data = read_process_memory(pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE) + data = read_process_memory( + pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE + ) + # Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structure debug_offsets = parse_debug_offsets(data) - # Step 3: Validate compatibility + + # Step 3: Validate the contents of the structure if debug_offsets.cookie != EXPECTED_COOKIE: raise RuntimeError("Invalid or missing debug cookie") if debug_offsets.version != LOCAL_PYTHON_VERSION: - raise RuntimeError("Mismatch between caller and target Python versions") + raise RuntimeError( + "Mismatch between caller and target Python versions" + ) if debug_offsets.free_threaded != LOCAL_FREE_THREADED: raise RuntimeError("Mismatch in free-threaded configuration") + return debug_offsets -Locating the Interpreter and Thread State + + +.. warning:: + + **Process suspension recommended** + + To avoid race conditions and ensure memory consistency, it is strongly + recommended that the target process be suspended before performing any + operations that read or write internal interpreter state. The Python runtime + may concurrently mutate interpreter data structures—such as creating or + destroying threads—during normal execution. This can result in invalid + memory reads or writes. + + A debugger may suspend execution by attaching to the process with ``ptrace`` + or by sending a ``SIGSTOP`` signal. Execution should only be resumed after + debugger-side memory operations are complete. + + .. note:: + + Some tools, such as profilers or sampling-based debuggers, may operate on + a running process without suspension. In such cases, tools must be + explicitly designed to handle partially updated or inconsistent memory. + For most debugger implementations, suspending the process remains the + safest and most robust approach. + + +Locating the interpreter and thread state ========================================= -After validating the ``_Py_DebugOffsets`` structure, the next step is to locate the -interpreter and thread state objects within the target process. These structures -hold essential runtime context and are required for writing debugger control -information. +Before code can be injected and executed in a remote Python process, the +debugger must choose a thread in which to schedule execution. This is necessary +because the control fields used to perform remote code injection are located in +the ``_PyRemoteDebuggerSupport`` structure, which is embedded in a +``PyThreadState`` object. These fields are modified by the debugger to request +execution of injected scripts. -- The ``PyInterpreterState`` structure represents a Python interpreter instance. - Each interpreter holds its own module imports, built-in state, and thread list. - Most applications use only one interpreter, but CPython supports creating multiple - interpreters in the same process. +The ``PyThreadState`` structure represents a thread running inside a Python +interpreter. It maintains the thread’s evaluation context and contains the +fields required for debugger coordination. Locating a valid ``PyThreadState`` +is therefore a key prerequisite for triggering execution remotely. -- The ``PyThreadState`` structure represents a thread running within an interpreter. - This is where evaluation state and the control fields used by the debugger live. +A thread is typically selected based on its role or ID. In most cases, the main +thread is used, but some tools may target a specific thread by its native +thread ID. Once the target thread is chosen, the debugger must locate both the +interpreter and the associated thread state structures in memory. -To inject and run code remotely, the debugger must locate a valid ``PyThreadState`` -to target. Typically, this is the main thread, but in some cases, the debugger may -want to attach to a specific thread by its native thread ID. +The relevant internal structures are defined as follows: -To locate a thread: +- ``PyInterpreterState`` represents an isolated Python interpreter instance. + Each interpreter maintains its own set of imported modules, built-in state, + and thread state list. Although most Python applications use a single + interpreter, CPython supports multiple interpreters in the same process. -1. Use the offset ``runtime_state.interpreters_head`` to find the address of the - first interpreter in the ``PyRuntime`` structure. This is the entry point to - the list of active interpreters. +- ``PyThreadState`` represents a thread running within an interpreter. It + contains execution state and the control fields used by the debugger. -2. Use the offset ``interpreter_state.threads_main`` to locate the main thread - of that interpreter. This is the simplest and most reliable thread to target. +To locate a thread: -3. Optionally, use ``interpreter_state.threads_head`` to walk the linked list of - all threads. For each ``PyThreadState``, compare the ``native_thread_id`` - field (using ``thread_state.native_thread_id``) to find a specific thread. +1. Use the offset ``runtime_state.interpreters_head`` to obtain the address of + the first interpreter in the ``PyRuntime`` structure. This is the entry point + to the linked list of active interpreters. - This is useful when the debugger allows the user to select which thread to inject into, - or when targeting a thread that's actively running. +2. Use the offset ``interpreter_state.threads_main`` to access the main thread + state associated with the selected interpreter. This is typically the most + reliable thread to target. -4. Once a valid ``PyThreadState`` is found, record its address. This will be used - in the next step to write debugger control fields and schedule execution. +3. Optionally, use the offset ``interpreter_state.threads_head`` to iterate +through the linked list of all thread states. Each ``PyThreadState`` structure +contains a ``native_thread_id`` field, which may be compared to a target thread +ID to find a specific thread. -An example of locating the main thread: +1. Once a valid ``PyThreadState`` has been found, its address can be used in +later steps of the protocol, such as writing debugger control fields and +scheduling execution. -.. code-block:: python +The following is an example implementation that locates the main thread state:: - def find_main_thread_state(pid, py_runtime_addr, debug_offsets): + def find_main_thread_state( + pid: int, py_runtime_addr: int, debug_offsets: DebugOffsets, + ) -> int: # Step 1: Read interpreters_head from PyRuntime - interp_head_ptr = py_runtime_addr + debug_offsets.runtime_state.interpreters_head + interp_head_ptr = ( + py_runtime_addr + debug_offsets.runtime_state.interpreters_head + ) interp_addr = read_pointer(pid, interp_head_ptr) if interp_addr == 0: raise RuntimeError("No interpreter found in the target process") + # Step 2: Read the threads_main pointer from the interpreter - threads_main_ptr = interp_addr + debug_offsets.interpreter_state.threads_main + threads_main_ptr = ( + interp_addr + debug_offsets.interpreter_state.threads_main + ) thread_state_addr = read_pointer(pid, threads_main_ptr) if thread_state_addr == 0: raise RuntimeError("Main thread state is not available") - return thread_state_addr -To locate a specific thread by native thread ID: + return thread_state_addr -.. code-block:: python +The following example demonstrates how to locate a thread by its native thread +ID:: - def find_thread_by_id(pid, interp_addr, debug_offsets, target_tid): + def find_thread_by_id( + pid: int, + interp_addr: int, + debug_offsets: DebugOffsets, + target_tid: int, + ) -> int: # Start at threads_head and walk the linked list thread_ptr = read_pointer( - pid, interp_addr + debug_offsets.interpreter_state.threads_head + pid, + interp_addr + debug_offsets.interpreter_state.threads_head ) + while thread_ptr: - native_tid_ptr = thread_ptr + debug_offsets.thread_state.native_thread_id + native_tid_ptr = ( + thread_ptr + debug_offsets.thread_state.native_thread_id + ) native_tid = read_int(pid, native_tid_ptr) if native_tid == target_tid: return thread_ptr - thread_ptr = read_pointer(pid, thread_ptr + debug_offsets.thread_state.next) - raise RuntimeError("Thread with the given ID was not found") - -Once a valid thread state has been identified, the debugger can use it to modify -control fields and request execution in the next stage of the protocol. - -Writing Control Information -=========================== - -Once a valid thread state has been located, the debugger can write control fields -that instruct the target process to execute a script at the next safe opportunity. - -Each thread state contains a ``_PyRemoteDebuggerSupport`` structure, which is used -to coordinate communication between the debugger and the interpreter. The debugger -uses offsets from ``_Py_DebugOffsets`` to locate three key fields: + thread_ptr = read_pointer( + pid, + thread_ptr + debug_offsets.thread_state.next + ) -- ``debugger_script_path``: A buffer where the debugger writes the full path to - a Python source file (``.py``). The file must exist and be readable by the - target process. - -- ``debugger_pending_call``: An integer flag. When set to ``1``, it signals - that a script is ready to be executed. - -- ``eval_breaker``: A field checked periodically by the evaluation loop. To - notify the interpreter of pending debugger activity, the debugger sets the - ``_PY_EVAL_PLEASE_STOP_BIT`` in this field. This causes the interpreter to pause - and check for debugger-related actions before continuing with normal execution. - -To safely modify these fields, most debuggers should suspend the process before -writing to memory. This avoids race conditions that may occur if the interpreter -is actively running. - -To perform the injection: + raise RuntimeError("Thread with the given ID was not found") -1. Write the script path into the ``debugger_script_path`` buffer. -2. Set the ``debugger_pending_call`` flag to ``1``. -3. Read the value of ``eval_breaker``, set the stop bit, and write the updated - value back. -An example implementation might look like: +Once a valid thread state has been located, the debugger can proceed with +modifying its control fields and scheduling execution, as described in the next +section. -.. code-block:: python +Writing control information +=========================== - def inject_script(pid, thread_state_addr, debug_offsets, script_path): - # Base offset to the _PyRemoteDebuggerSupport struct +Once a valid ``PyThreadState`` structure has been identified, the debugger may +modify control fields within it to schedule the execution of a specified Python +script. These control fields are checked periodically by the interpreter, and +when set correctly, they trigger the execution of remote code at a safe point +in the evaluation loop. + +Each ``PyThreadState`` contains a ``_PyRemoteDebuggerSupport`` structure used +for communication between the debugger and the interpreter. The locations of +its fields are defined by the ``_Py_DebugOffsets`` structure and include the +following: + +- ``debugger_script_path``: A fixed-size buffer that holds the full path to a + Python source file (``.py``). This file must be accessible and readable by + the target process when execution is triggered. + +- ``debugger_pending_call``: An integer flag. Setting this to ``1`` tells the + interpreter that a script is ready to be executed. + +- ``eval_breaker``: A field checked by the interpreter during execution. + Setting bit 5 (``_PY_EVAL_PLEASE_STOP_BIT``, value ``1U << 5``) in this + field causes the interpreter to pause and check for debugger activity. + +To complete the injection, the debugger must perform the following steps: + +1. Write the full script path into the ``debugger_script_path`` buffer. +2. Set ``debugger_pending_call`` to ``1``. +3. Read the current value of ``eval_breaker``, set bit 5 + (``_PY_EVAL_PLEASE_STOP_BIT``), and write the updated value back. This + signals the interpreter to check for debugger activity. + +The following is an example implementation:: + + def inject_script( + pid: int, + thread_state_addr: int, + debug_offsets: DebugOffsets, + script_path: str + ) -> None: + # Compute the base offset of _PyRemoteDebuggerSupport support_base = ( thread_state_addr + debug_offsets.debugger_support.remote_debugger_support ) - # 1. Write script path - script_path_ptr = support_base + debug_offsets.debugger_support.debugger_script_path + + # Step 1: Write the script path into debugger_script_path + script_path_ptr = ( + support_base + + debug_offsets.debugger_support.debugger_script_path + ) write_string(pid, script_path_ptr, script_path) - # 2. Set debugger_pending_call = 1 - pending_ptr = support_base + debug_offsets.debugger_support.debugger_pending_call + + # Step 2: Set debugger_pending_call to 1 + pending_ptr = ( + support_base + + debug_offsets.debugger_support.debugger_pending_call + ) write_int(pid, pending_ptr, 1) - # 3. Set _PY_EVAL_PLEASE_STOP_BIT in eval_breaker - eval_breaker_ptr = thread_state_addr + debug_offsets.debugger_support.eval_breaker + + # Step 3: Set _PY_EVAL_PLEASE_STOP_BIT (bit 5, value 1 << 5) in + # eval_breaker + eval_breaker_ptr = ( + thread_state_addr + + debug_offsets.debugger_support.eval_breaker + ) breaker = read_int(pid, eval_breaker_ptr) - # Set the least significant bit (this is _PY_EVAL_PLEASE_STOP_BIT) - breaker |= 1 + breaker |= (1 << 5) write_int(pid, eval_breaker_ptr, breaker) -After these writes are complete, the debugger may resume the process (if it was paused). -The interpreter will check ``eval_breaker`` at the next evaluation checkpoint, -detect the pending call, and load and execute the specified Python file. The debugger is responsible -for ensuring that the file remains on disk and readable by the target interpreter -when it is accessed. + +Once these fields are set, the debugger may resume the process (if it was +suspended). The interpreter will process the request at the next safe +evaluation point, load the script from disk, and execute it. + +It is the responsibility of the debugger to ensure that the script file remains +present and accessible to the target process during execution. + +.. note:: + + Script execution is asynchronous. The script file cannot be deleted + immediately after injection. The debugger should wait until the injected + script has produced an observable effect before removing the file. + This effect depends on what the script is designed to do. For example, + a debugger might wait until the remote process connects back to a socket + before removing the script. Once such an effect is observed, it is safe to + assume the file is no longer needed. Summary ======= -To inject and execute a script in a remote Python process: +To inject and execute a Python script in a remote parocess: -1. Locate the ``PyRuntime`` structure in the target process's memory. -2. Read and validate the ``_Py_DebugOffsets`` structure at the start of ``PyRuntime``. +1. Locate the ``PyRuntime`` structure in the target process’s memory. +2. Read and validate the ``_Py_DebugOffsets`` structure at the beginning of + ``PyRuntime``. 3. Use the offsets to locate a valid ``PyThreadState``. 4. Write the path to a Python script into ``debugger_script_path``. -5. Set ``debugger_pending_call = 1``. -6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in ``eval_breaker``. -7. Resume the process (if paused). The script will be executed at the next safe eval point. +5. Set the ``debugger_pending_call`` flag to ``1``. +6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field. +7. Resume the process (if suspended). The script will execute at the next safe + evaluation point. + From 1730c05519a5810cb4ef6383015e896fc95b3207 Mon Sep 17 00:00:00 2001 From: Ivona Stojanovic Date: Sun, 20 Apr 2025 20:40:05 +0200 Subject: [PATCH 3/4] fixup! gh-131591: Add remote debugging attachment protocol documentation --- Doc/howto/remote_debugging.rst | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst index 2b776c6f823aa2..a2fafa391a7ac4 100644 --- a/Doc/howto/remote_debugging.rst +++ b/Doc/howto/remote_debugging.rst @@ -14,7 +14,6 @@ detailed explanation of the underlying protocol, which takes as input the executed. This information supports independent reimplementation of the protocol, regardless of programming language. - .. warning:: The execution of the injected script depends on the interpreter reaching a @@ -149,10 +148,11 @@ To find the ``PyRuntime`` structure on Windows: `_. 2. Identify the module corresponding to :file:`python.exe` or :file:`python{XY}.dll`, where ``X`` and ``Y`` are the major and minor - version numbers of the Python version (for example, ``python311.dll``), and - record its base address. -3. Locate the ``PyRuntim`` section. Section names in the PE format are limited - to 8 characters. + version numbers of the Python version, and record its base address. +3. Locate the ``PyRuntim`` section. Due to the PE format's 8-character limit + on section names (defined as ``IMAGE_SIZEOF_SHORT_NAME``), the original + name ``PyRuntime`` is truncated. This section contains the ``PyRuntime`` + structure. 4. Retrieve the section’s relative virtual address (RVA) and add it to the base address of the module. @@ -171,7 +171,9 @@ The following is an example implementation:: pid, name_contains="python3" ) - # Step 3: Parse PE section headers to get PyRuntim RVA + # Step 3: Parse PE section headers to get the RVA of the PyRuntime + # section. The section name appears as "PyRuntim" due to the + # 8-character limit defined by the PE format (IMAGE_SIZEOF_SHORT_NAME). section_rva = parse_pe_section_offset(binary_path, "PyRuntim") # Step 4: Compute PyRuntime address in memory @@ -179,8 +181,8 @@ The following is an example implementation:: -RReading _Py_DebugOffsets -========================= +Reading _Py_DebugOffsets +======================== Once the address of the ``PyRuntime`` structure has been determined, the next step is to read the ``_Py_DebugOffsets`` structure located at the beginning of From c8a9a61ed5fdff8fae0f3a2dc4a12732ee7a0b57 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Mon, 21 Apr 2025 21:08:27 +0100 Subject: [PATCH 4/4] Add details per arch Signed-off-by: Pablo Galindo --- Doc/howto/remote_debugging.rst | 68 ++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst index a2fafa391a7ac4..37c3572c8a3c31 100644 --- a/Doc/howto/remote_debugging.rst +++ b/Doc/howto/remote_debugging.rst @@ -94,6 +94,30 @@ The following is an example implementation:: return base_address + section_offset +On Linux systems, there are two main approaches to read memory from another +process. The first is through the ``/proc`` filesystem, specifically by reading from +``/proc/[pid]/mem`` which provides direct access to the process's memory. This +requires appropriate permissions - either being the same user as the target +process or having root access. The second approach is using the +``process_vm_readv()`` system call which provides a more efficient way to copy +memory between processes. While ptrace's ``PTRACE_PEEKTEXT`` operation can also be +used to read memory, it is significantly slower as it only reads one word at a +time and requires multiple context switches between the tracer and tracee +processes. + +For parsing ELF sections, the process involves reading and interpreting the ELF +file format structures from the binary file on disk. The ELF header contains a +pointer to the section header table. Each section header contains metadata about +a section including its name (stored in a separate string table), offset, and +size. To find a specific section like .PyRuntime, you need to walk through these +headers and match the section name. The section header then provdes the offset +where that section exists in the file, which can be used to calculate its +runtime address when the binary is loaded into memory. + +You can read more about the ELF file format in the `ELF specification +`_. + + .. rubric:: macOS (Mach-O) To find the ``PyRuntime`` structure on macOS: @@ -134,6 +158,29 @@ The following is an example implementation:: # Step 5: Compute the PyRuntime address in memory return base_address + section_offset +On macOS, accessing another process's memory requires using Mach-O specific APIs +and file formats. The first step is obtaining a ``task_port`` handle via +``task_for_pid()``, which provides access to the target process's memory space. +This handle enables memory operations through APIs like +``mach_vm_read_overwrite()``. + +The process memory can be examined using ``mach_vm_region()`` to scan through the +virtual memory space, while ``proc_regionfilename()`` helps identify which binary +files are loaded at each memory region. When the Python binary or library is +found, its Mach-O headers need to be parsed to locate the ``PyRuntime`` structure. + +The Mach-O format organizes code and data into segments and sections. The +``PyRuntime`` structure lives in a section named ``__PyRuntime`` within the +``__DATA`` segment. The actual runtime address calculation involves finding the +``__TEXT`` segment which serves as the binary's base address, then locating the +``__DATA`` segment containing our target section. The final address is computed by +combining the base address with the appropriate section offsets from the Mach-O +headers. + +Note that accessing another process's memory on macOS typically requires +elevated privileges - either root access or special security entitlements +granted to the debugging process. + .. rubric:: Windows (PE) @@ -180,6 +227,27 @@ The following is an example implementation:: return base_address + section_rva +On Windows, accessing another process's memory requires using the Windows API +functions like ``CreateToolhelp32Snapshot()`` and ``Module32First()/Module32Next()`` +to enumerate loaded modules. The ``OpenProcess()`` function provides a handle to +access the target process's memory space, enabling memory operations through +``ReadProcessMemory()``. + +The process memory can be examined by enumerating loaded modules to find the +Python binary or DLL. When found, its PE headers need to be parsed to locate the +``PyRuntime`` structure. + +The PE format organizes code and data into sections. The ``PyRuntime`` structure +lives in a section named "PyRuntim" (truncated from "PyRuntime" due to PE's +8-character name limit). The actual runtime address calculation involves finding +the module's base address from the module entry, then locating our target +section in the PE headers. The final address is computed by combining the base +address with the section's virtual address from the PE section headers. + +Note that accessing another process's memory on Windows typically requires +appropriate privileges - either administrative access or the ``SeDebugPrivilege`` +privilege granted to the debugging process. + Reading _Py_DebugOffsets ========================