From 34c64f1cff8c4e14338f1b15fd0b93420559330e Mon Sep 17 00:00:00 2001
From: Ivona Stojanovic <stojanovoic.i@hotmail.com>
Date: Thu, 17 Apr 2025 16:41:47 +0200
Subject: [PATCH 1/4] gh-131591: Add remote debugging attachment protocol
 documentation

Add a developer-facing document describing the protocol used by
remote_exec(pid, script) to execute Python code in a running process.
This is intended to guide debugger and tool authors in reimplementing
the protocol.
---
 Doc/howto/index.rst            |   2 +
 Doc/howto/remote_debugging.rst | 335 +++++++++++++++++++++++++++++++++
 2 files changed, 337 insertions(+)
 create mode 100644 Doc/howto/remote_debugging.rst
diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst
index c09f92c9528ee1..f350141004c2db 100644
--- a/Doc/howto/index.rst
+++ b/Doc/howto/index.rst
@@ -34,6 +34,7 @@ Python Library Reference.
    mro.rst
    free-threading-python.rst
    free-threading-extensions.rst
+   remote_debugging.rst
 
 General:
 
@@ -66,3 +67,4 @@ Debugging and profiling:
 * :ref:`gdb`
 * :ref:`instrumentation`
 * :ref:`perf_profiling`
+* :ref:`remote-debugging`
diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst
new file mode 100644
index 00000000000000..0fd95e1210d844
--- /dev/null
+++ b/Doc/howto/remote_debugging.rst
@@ -0,0 +1,335 @@
+.. _remote-debugging:
+
+Remote Debugging Attachment Protocol
+====================================
+
+This section explains the low-level protocol that allows external code to inject and execute
+a Python script inside a running CPython process.
+
+This is the mechanism implemented by the :func:`sys.remote_exec` function, which
+instructs a remote Python process to execute a ``.py`` file. This section is not about using that
+function, instead, it explains how the underlying protocol works so that it can be
+reimplemented in any language.
+
+The protocol assumes you already know the process you want to target and the code you want it to run.
+That’s why it takes two pieces of information:
+
+- The process ID (``pid``) of the Python process you want to interact with.
+- A path to a Python script file (``.py``) that contains the code to be executed.
+
+Once injected, the script is executed by the target process’s interpreter the next time it reaches
+a safe evaluation point. This allows tools to trigger
+code execution remotely without modifying the Python program itself.
+
+In the sections that follow, we’ll walk through each step of this protocol in detail: how to locate
+the interpreter in memory, how to access internal structures safely, and how to trigger the execution
+of your script. Where necessary, we’ll highlight differences across platforms (Linux, macOS, Windows),
+and include example code to help clarify each part of the process.
+
+Locating the PyRuntime Structure
+================================
+
+The ``PyRuntime`` structure holds CPython's global interpreter state and serves as
+the entry point to other internal data, including the list of interpreters,
+thread states, and debugger support fields.
+
+To interact with a remote Python process, a debugger must first compute the memory
+address of the ``PyRuntime`` structure inside the target process. This cannot be
+hardcoded or inferred symbolically, since its location depends on how the binary was
+mapped into memory by the operating system.
+
+The process for locating ``PyRuntime`` is platform-specific, but follows the same
+high-level approach:
+
+1. Identify where the Python executable or shared library was loaded in the target process.
+2. Parse the corresponding binary file on disk to find the offset of the
+   ``.PyRuntime`` section.
+3. Compute the in-memory address of ``PyRuntime`` by relocating the section offset
+   to the base address found in step 1.
+
+Each subsection below explains what must be done and provides a short example of how this
+can be implemented.
+
+.. rubric:: Linux (ELF)
+
+To locate the ``PyRuntime`` structure on Linux:
+
+1. Inspect the memory mappings of the target process (e.g. from ``/proc/<pid>/maps``)
+   to find the memory region where the Python executable or shared ``libpython``
+   library is loaded. Record its base address.
+2. Load the binary file from disk and parse its ELF section headers.
+   Locate the ``.PyRuntime`` section and determine its file offset.
+3. Add the section offset to the base address to compute the address of the
+   ``PyRuntime`` structure in memory.
+
+An example implementation might look like:
+
+.. code-block:: python
+
+    def find_py_runtime_linux(pid):
+        # Step 1: Try to find the Python executable in memory
+        binary_path, base_address = find_mapped_binary(pid, name_contains="python")
+        # Step 2: Fallback to shared library if executable is not found
+        if binary_path is None:
+            binary_path, base_address = find_mapped_binary(pid, name_contains="libpython")
+        # Step 3: Parse ELF headers of the binary to get .PyRuntime section offset
+        section_offset = parse_elf_section_offset(binary_path, ".PyRuntime")
+        # Step 4: Compute PyRuntime address in memory
+        return base_address + section_offset
+
+.. rubric:: macOS (Mach-O)
+
+To locate the ``PyRuntime`` structure on macOS:
+
+1. Obtain a handle to the target process that allows memory inspection.
+2. Walk the memory regions of the process to identify the one that contains the
+   Python binary or shared library. Record its base address and associated file path.
+3. Load that binary file from disk and parse the Mach-O headers to find the
+   ``__DATA,__PyRuntime`` section.
+4. Add the section's offset to the base address of the loaded binary to compute
+   the address of the ``PyRuntime`` structure.
+
+An example implementation might look like:
+
+.. code-block:: python
+
+    def find_py_runtime_macos(pid):
+        # Step 1: Get access to the process's memory
+        handle = get_memory_access_handle(pid)
+        # Step 2: Try to find the Python executable in memory
+        binary_path, base_address = find_mapped_binary(handle, name_contains="python")
+        # Step 3: Fallback to libpython if executable is not found
+        if binary_path is None:
+            binary_path, base_address = find_mapped_binary(handle, name_contains="libpython")
+        # Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offset
+        section_offset = parse_macho_section_offset(binary_path, "__DATA", "__PyRuntime")
+        # Step 5: Compute PyRuntime address in memory
+        return base_address + section_offset
+
+.. rubric:: Windows (PE)
+
+To locate the ``PyRuntime`` structure on Windows:
+
+1. Enumerate all modules loaded in the target process.
+   Identify the module corresponding to ``python.exe`` or ``pythonXY.dll``, where X and Y
+   are the major and minor version numbers of the Python version, and record its base address.
+2. Load the binary from disk and parse the PE section headers.
+   Locate the ``.PyRuntime`` section and determine its relative virtual address (RVA).
+3. Add the RVA to the module’s base address to compute the full in-memory address
+   of the ``PyRuntime`` structure.
+
+An example implementation might look like:
+
+.. code-block:: python
+
+    def find_py_runtime_windows(pid):
+        # Step 1: Try to find the Python executable in memory
+        binary_path, base_address = find_loaded_module(pid, name_contains="python")
+        # Step 2: Fallback to shared pythonXY.dll if executable is not found
+        if binary_path is None:
+            binary_path, base_address = find_loaded_module(pid, name_contains="python3")
+        # Step 3: Parse PE section headers to get .PyRuntime RVA
+        section_rva = parse_pe_section_offset(binary_path, ".PyRuntime")
+        # Step 4: Compute PyRuntime address in memory
+        return base_address + section_rva
+
+Reading _Py_DebugOffsets
+=========================
+
+Once the address of the ``PyRuntime`` structure has been computed in the target
+process, the next step is to read the ``_Py_DebugOffsets`` structure located at
+its beginning.
+
+This structure contains version-specific field offsets needed to navigate
+interpreter and thread state memory safely.
+
+To read and validate the debug offsets:
+
+1. Read the memory at the address of ``PyRuntime``, up to the size of
+   ``_Py_DebugOffsets``. This structure is located at the very start of the
+   ``PyRuntime`` block.
+
+2. Verify that the contents of the structure are valid. In particular:
+
+   - The ``cookie`` field must match the expected debug marker.
+   - The ``version`` field must match the version of the Python interpreter
+     used by the calling process (i.e., the debugger or controlling runtime).
+   - If either the caller or the target process is running a pre-release version
+     (such as an alpha, beta, or release candidate), then the versions must match
+     exactly.
+   - The ``free_threaded`` flag must match between the caller and the target process.
+
+3. If the structure passes validation, the debugger may now safely use the
+   provided offsets to locate fields in interpreter and thread state structures.
+
+If any validation step fails, the debugger should abort rather than attempting to
+access incompatible memory layouts.
+
+An example of how a debugger might read and validate ``_Py_DebugOffsets``:
+
+.. code-block:: python
+
+    def read_debug_offsets(pid, py_runtime_addr):
+        # Step 1: Read memory from the target process at the PyRuntime address
+        data = read_process_memory(pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE)
+        # Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structure
+        debug_offsets = parse_debug_offsets(data)
+        # Step 3: Validate compatibility
+        if debug_offsets.cookie != EXPECTED_COOKIE:
+            raise RuntimeError("Invalid or missing debug cookie")
+        if debug_offsets.version != LOCAL_PYTHON_VERSION:
+            raise RuntimeError("Mismatch between caller and target Python versions")
+        if debug_offsets.free_threaded != LOCAL_FREE_THREADED:
+            raise RuntimeError("Mismatch in free-threaded configuration")
+        return debug_offsets
+
+Locating the Interpreter and Thread State
+=========================================
+
+After validating the ``_Py_DebugOffsets`` structure, the next step is to locate the
+interpreter and thread state objects within the target process. These structures
+hold essential runtime context and are required for writing debugger control
+information.
+
+- The ``PyInterpreterState`` structure represents a Python interpreter instance.
+  Each interpreter holds its own module imports, built-in state, and thread list.
+  Most applications use only one interpreter, but CPython supports creating multiple
+  interpreters in the same process.
+
+- The ``PyThreadState`` structure represents a thread running within an interpreter.
+  This is where evaluation state and the control fields used by the debugger live.
+
+To inject and run code remotely, the debugger must locate a valid ``PyThreadState``
+to target. Typically, this is the main thread, but in some cases, the debugger may
+want to attach to a specific thread by its native thread ID.
+
+To locate a thread:
+
+1. Use the offset ``runtime_state.interpreters_head`` to find the address of the
+   first interpreter in the ``PyRuntime`` structure. This is the entry point to
+   the list of active interpreters.
+
+2. Use the offset ``interpreter_state.threads_main`` to locate the main thread
+   of that interpreter. This is the simplest and most reliable thread to target.
+
+3. Optionally, use ``interpreter_state.threads_head`` to walk the linked list of
+   all threads. For each ``PyThreadState``, compare the ``native_thread_id``
+   field (using ``thread_state.native_thread_id``) to find a specific thread.
+
+   This is useful when the debugger allows the user to select which thread to inject into,
+   or when targeting a thread that's actively running.
+
+4. Once a valid ``PyThreadState`` is found, record its address. This will be used
+   in the next step to write debugger control fields and schedule execution.
+
+An example of locating the main thread:
+
+.. code-block:: python
+
+    def find_main_thread_state(pid, py_runtime_addr, debug_offsets):
+        # Step 1: Read interpreters_head from PyRuntime
+        interp_head_ptr = py_runtime_addr + debug_offsets.runtime_state.interpreters_head
+        interp_addr = read_pointer(pid, interp_head_ptr)
+        if interp_addr == 0:
+            raise RuntimeError("No interpreter found in the target process")
+        # Step 2: Read the threads_main pointer from the interpreter
+        threads_main_ptr = interp_addr + debug_offsets.interpreter_state.threads_main
+        thread_state_addr = read_pointer(pid, threads_main_ptr)
+        if thread_state_addr == 0:
+            raise RuntimeError("Main thread state is not available")
+        return thread_state_addr
+
+To locate a specific thread by native thread ID:
+
+.. code-block:: python
+
+    def find_thread_by_id(pid, interp_addr, debug_offsets, target_tid):
+        # Start at threads_head and walk the linked list
+        thread_ptr = read_pointer(
+            pid, interp_addr + debug_offsets.interpreter_state.threads_head
+        )
+        while thread_ptr:
+            native_tid_ptr = thread_ptr + debug_offsets.thread_state.native_thread_id
+            native_tid = read_int(pid, native_tid_ptr)
+            if native_tid == target_tid:
+                return thread_ptr
+            thread_ptr = read_pointer(pid, thread_ptr + debug_offsets.thread_state.next)
+        raise RuntimeError("Thread with the given ID was not found")
+
+Once a valid thread state has been identified, the debugger can use it to modify
+control fields and request execution in the next stage of the protocol.
+
+Writing Control Information
+===========================
+
+Once a valid thread state has been located, the debugger can write control fields
+that instruct the target process to execute a script at the next safe opportunity.
+
+Each thread state contains a ``_PyRemoteDebuggerSupport`` structure, which is used
+to coordinate communication between the debugger and the interpreter. The debugger
+uses offsets from ``_Py_DebugOffsets`` to locate three key fields:
+
+- ``debugger_script_path``: A buffer where the debugger writes the full path to
+  a Python source file (``.py``). The file must exist and be readable by the
+  target process.
+
+- ``debugger_pending_call``: An integer flag. When set to ``1``, it signals
+  that a script is ready to be executed.
+
+- ``eval_breaker``: A field checked periodically by the evaluation loop. To
+  notify the interpreter of pending debugger activity, the debugger sets the
+  ``_PY_EVAL_PLEASE_STOP_BIT`` in this field. This causes the interpreter to pause
+  and check for debugger-related actions before continuing with normal execution.
+
+To safely modify these fields, most debuggers should suspend the process before
+writing to memory. This avoids race conditions that may occur if the interpreter
+is actively running.
+
+To perform the injection:
+
+1. Write the script path into the ``debugger_script_path`` buffer.
+2. Set the ``debugger_pending_call`` flag to ``1``.
+3. Read the value of ``eval_breaker``, set the stop bit, and write the updated
+   value back.
+
+An example implementation might look like:
+
+.. code-block:: python
+
+    def inject_script(pid, thread_state_addr, debug_offsets, script_path):
+        # Base offset to the _PyRemoteDebuggerSupport struct
+        support_base = (
+            thread_state_addr +
+            debug_offsets.debugger_support.remote_debugger_support
+        )
+        # 1. Write script path
+        script_path_ptr = support_base + debug_offsets.debugger_support.debugger_script_path
+        write_string(pid, script_path_ptr, script_path)
+        # 2. Set debugger_pending_call = 1
+        pending_ptr = support_base + debug_offsets.debugger_support.debugger_pending_call
+        write_int(pid, pending_ptr, 1)
+        # 3. Set _PY_EVAL_PLEASE_STOP_BIT in eval_breaker
+        eval_breaker_ptr = thread_state_addr + debug_offsets.debugger_support.eval_breaker
+        breaker = read_int(pid, eval_breaker_ptr)
+        # Set the least significant bit (this is _PY_EVAL_PLEASE_STOP_BIT)
+        breaker |= 1
+        write_int(pid, eval_breaker_ptr, breaker)
+
+After these writes are complete, the debugger may resume the process (if it was paused).
+The interpreter will check ``eval_breaker`` at the next evaluation checkpoint,
+detect the pending call, and load and execute the specified Python file. The debugger is responsible
+for ensuring that the file remains on disk and readable by the target interpreter
+when it is accessed.
+
+Summary
+=======
+
+To inject and execute a script in a remote Python process:
+
+1. Locate the ``PyRuntime`` structure in the target process's memory.
+2. Read and validate the ``_Py_DebugOffsets`` structure at the start of ``PyRuntime``.
+3. Use the offsets to locate a valid ``PyThreadState``.
+4. Write the path to a Python script into ``debugger_script_path``.
+5. Set ``debugger_pending_call = 1``.
+6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in ``eval_breaker``.
+7. Resume the process (if paused). The script will be executed at the next safe eval point.

From bb9d85d63841f5ffb976bb7a459fd3e997f86e04 Mon Sep 17 00:00:00 2001
From: Ivona Stojanovic <stojanovic.i@hotmail.com>
Date: Sun, 20 Apr 2025 12:55:25 +0200
Subject: [PATCH 2/4] fixup! gh-131591: Add remote debugging attachment
 protocol documentation

---
 Doc/howto/remote_debugging.rst | 562 ++++++++++++++++++++-------------
 1 file changed, 351 insertions(+), 211 deletions(-)

diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst
index 0fd95e1210d844..2b776c6f823aa2 100644
--- a/Doc/howto/remote_debugging.rst
+++ b/Doc/howto/remote_debugging.rst
@@ -1,335 +1,475 @@
 .. _remote-debugging:
 
-Remote Debugging Attachment Protocol
+Remote debugging attachment protocol
 ====================================
 
-This section explains the low-level protocol that allows external code to inject and execute
-a Python script inside a running CPython process.
+This section describes the low-level protocol that enables external tools to
+inject and execute a Python script within a running CPython process.
 
-This is the mechanism implemented by the :func:`sys.remote_exec` function, which
-instructs a remote Python process to execute a ``.py`` file. This section is not about using that
-function, instead, it explains how the underlying protocol works so that it can be
-reimplemented in any language.
+This mechanism forms the basis of the :func:`sys.remote_exec` function, which
+instructs a remote Python process to execute a ``.py`` file. However, this
+section does not document the usage of that function. Instead, it provides a
+detailed explanation of the underlying protocol, which takes as input the
+``pid`` of a target Python process and the path to a Python source file to be
+executed. This information supports independent reimplementation of the
+protocol, regardless of programming language.
 
-The protocol assumes you already know the process you want to target and the code you want it to run.
-That’s why it takes two pieces of information:
 
-- The process ID (``pid``) of the Python process you want to interact with.
-- A path to a Python script file (``.py``) that contains the code to be executed.
+.. warning::
 
-Once injected, the script is executed by the target process’s interpreter the next time it reaches
-a safe evaluation point. This allows tools to trigger
-code execution remotely without modifying the Python program itself.
+    The execution of the injected script depends on the interpreter reaching a
+    safe evaluation point. As a result, execution may be delayed depending on
+    the runtime state of the target process.
 
-In the sections that follow, we’ll walk through each step of this protocol in detail: how to locate
-the interpreter in memory, how to access internal structures safely, and how to trigger the execution
-of your script. Where necessary, we’ll highlight differences across platforms (Linux, macOS, Windows),
-and include example code to help clarify each part of the process.
+Once injected, the script is executed by the interpreter within the target
+process the next time a safe evaluation point is reached. This approach enables
+remote execution capabilities without modifying the behavior or structure of
+the running Python application.
 
-Locating the PyRuntime Structure
+Subsequent sections provide a step-by-step description of the protocol,
+including techniques for locating interpreter structures in memory, safely
+accessing internal fields, and triggering code execution. Platform-specific
+variations are noted where applicable, and example implementations are included
+to clarify each operation.
+
+Locating the PyRuntime structure
 ================================
 
-The ``PyRuntime`` structure holds CPython's global interpreter state and serves as
-the entry point to other internal data, including the list of interpreters,
+CPython places the ``PyRuntime`` structure in a dedicated binary section to
+help external tools find it at runtime. The name and format of this section
+vary by platform. For example, ``.PyRuntime`` is used on ELF systems, and
+``__DATA,__PyRuntime`` is used on macOS. Tools can find the offset of this
+structure by examining the binary on disk.
+
+The ``PyRuntime`` structure contains CPython’s global interpreter state and
+provides access to other internal data, including the list of interpreters,
 thread states, and debugger support fields.
 
-To interact with a remote Python process, a debugger must first compute the memory
-address of the ``PyRuntime`` structure inside the target process. This cannot be
-hardcoded or inferred symbolically, since its location depends on how the binary was
-mapped into memory by the operating system.
+To work with a remote Python process, a debugger must first find the memory
+address of the ``PyRuntime`` structure in the target process. This address
+can’t be hardcoded or calculated from a symbol name, because it depends on
+where the operating system loaded the binary.
 
-The process for locating ``PyRuntime`` is platform-specific, but follows the same
-high-level approach:
+The method for finding ``PyRuntime`` depends on the platform, but the steps are
+the same in general:
 
-1. Identify where the Python executable or shared library was loaded in the target process.
-2. Parse the corresponding binary file on disk to find the offset of the
-   ``.PyRuntime`` section.
-3. Compute the in-memory address of ``PyRuntime`` by relocating the section offset
-   to the base address found in step 1.
+1. Find the base address where the Python binary or shared library was loaded
+   in the target process.
+2. Use the on-disk binary to locate the offset of the ``.PyRuntime`` section.
+3. Add the section offset to the base address to compute the address in memory.
 
-Each subsection below explains what must be done and provides a short example of how this
-can be implemented.
+The sections below explain how to do this on each supported platform and
+include example code.
 
 .. rubric:: Linux (ELF)
 
-To locate the ``PyRuntime`` structure on Linux:
+To find the ``PyRuntime`` structure on Linux:
 
-1. Inspect the memory mappings of the target process (e.g. from ``/proc/<pid>/maps``)
-   to find the memory region where the Python executable or shared ``libpython``
-   library is loaded. Record its base address.
-2. Load the binary file from disk and parse its ELF section headers.
-   Locate the ``.PyRuntime`` section and determine its file offset.
-3. Add the section offset to the base address to compute the address of the
-   ``PyRuntime`` structure in memory.
-
-An example implementation might look like:
+1. Read the process’s memory map (for example, ``/proc/<pid>/maps``) to find
+   the address where the Python executable or ``libpython`` was loaded.
+2. Parse the ELF section headers in the binary to get the offset of the
+   ``.PyRuntime`` section.
+3. Add that offset to the base address from step 1 to get the memory address of
+   ``PyRuntime``.
 
-.. code-block:: python
+The following is an example implementation::
 
-    def find_py_runtime_linux(pid):
+    def find_py_runtime_linux(pid: int) -> int:
         # Step 1: Try to find the Python executable in memory
-        binary_path, base_address = find_mapped_binary(pid, name_contains="python")
+        binary_path, base_address = find_mapped_binary(
+            pid, name_contains="python"
+        )
+
         # Step 2: Fallback to shared library if executable is not found
         if binary_path is None:
-            binary_path, base_address = find_mapped_binary(pid, name_contains="libpython")
-        # Step 3: Parse ELF headers of the binary to get .PyRuntime section offset
-        section_offset = parse_elf_section_offset(binary_path, ".PyRuntime")
+            binary_path, base_address = find_mapped_binary(
+                pid, name_contains="libpython"
+            )
+
+        # Step 3: Parse ELF headers to get .PyRuntime section offset
+        section_offset = parse_elf_section_offset(
+            binary_path, ".PyRuntime"
+        )
+
         # Step 4: Compute PyRuntime address in memory
         return base_address + section_offset
 
-.. rubric:: macOS (Mach-O)
 
-To locate the ``PyRuntime`` structure on macOS:
+.. rubric:: macOS (Mach-O)
 
-1. Obtain a handle to the target process that allows memory inspection.
-2. Walk the memory regions of the process to identify the one that contains the
-   Python binary or shared library. Record its base address and associated file path.
-3. Load that binary file from disk and parse the Mach-O headers to find the
-   ``__DATA,__PyRuntime`` section.
-4. Add the section's offset to the base address of the loaded binary to compute
-   the address of the ``PyRuntime`` structure.
+To find the ``PyRuntime`` structure on macOS:
 
-An example implementation might look like:
+1. Call ``task_for_pid()`` to get the ``mach_port_t`` task port for the target
+   process. This handle is needed to read memory using APIs like
+   ``mach_vm_read_overwrite`` and ``mach_vm_region``.
+2. Scan the memory regions to find the one containing the Python executable or
+   ``libpython``.
+3. Load the binary file from disk and parse the Mach-O headers to find the
+   section named ``PyRuntime`` in the ``__DATA`` segment.  On macOS, symbol
+   names are automatically prefixed with an underscore, so the ``PyRuntime``
+   symbol appears as ``_PyRuntime`` in the symbol table, but the section name
+   is not affected.
 
-.. code-block:: python
+The following is an example implementation::
 
-    def find_py_runtime_macos(pid):
+    def find_py_runtime_macos(pid: int) -> int:
         # Step 1: Get access to the process's memory
         handle = get_memory_access_handle(pid)
+
         # Step 2: Try to find the Python executable in memory
-        binary_path, base_address = find_mapped_binary(handle, name_contains="python")
-        # Step 3: Fallback to libpython if executable is not found
+        binary_path, base_address = find_mapped_binary(
+            handle, name_contains="python"
+        )
+
+        # Step 3: Fallback to libpython if the executable is not found
         if binary_path is None:
-            binary_path, base_address = find_mapped_binary(handle, name_contains="libpython")
+            binary_path, base_address = find_mapped_binary(
+                handle, name_contains="libpython"
+            )
+
         # Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offset
-        section_offset = parse_macho_section_offset(binary_path, "__DATA", "__PyRuntime")
-        # Step 5: Compute PyRuntime address in memory
+        section_offset = parse_macho_section_offset(
+            binary_path, "__DATA", "__PyRuntime"
+        )
+
+        # Step 5: Compute the PyRuntime address in memory
         return base_address + section_offset
 
-.. rubric:: Windows (PE)
 
-To locate the ``PyRuntime`` structure on Windows:
+.. rubric:: Windows (PE)
 
-1. Enumerate all modules loaded in the target process.
-   Identify the module corresponding to ``python.exe`` or ``pythonXY.dll``, where X and Y
-   are the major and minor version numbers of the Python version, and record its base address.
-2. Load the binary from disk and parse the PE section headers.
-   Locate the ``.PyRuntime`` section and determine its relative virtual address (RVA).
-3. Add the RVA to the module’s base address to compute the full in-memory address
-   of the ``PyRuntime`` structure.
+To find the ``PyRuntime`` structure on Windows:
+
+1. Use the ToolHelp API to enumerate all modules loaded in the target process.
+   This is done using functions such as `CreateToolhelp32Snapshot
+   <https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-createtoolhelp32snapshot>`_,
+   `Module32First
+   <https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-module32first>`_,
+   and `Module32Next
+   <https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-module32next>`_.
+2. Identify the module corresponding to :file:`python.exe` or
+   :file:`python{XY}.dll`, where ``X`` and ``Y`` are the major and minor
+   version numbers of the Python version (for example, ``python311.dll``), and
+   record its base address.
+3. Locate the ``PyRuntim`` section. Section names in the PE format are limited
+   to 8 characters.
+4. Retrieve the section’s relative virtual address (RVA) and add it to the base
+   address of the module.
+
+The following is an example implementation::
+
+    def find_py_runtime_windows(pid: int) -> int:
+        # Step 1: Try to find the Python executable in memory
+        binary_path, base_address = find_loaded_module(
+            pid, name_contains="python"
+        )
 
-An example implementation might look like:
+        # Step 2: Fallback to shared pythonXY.dll if the executable is not
+        # found
+        if binary_path is None:
+            binary_path, base_address = find_loaded_module(
+                pid, name_contains="python3"
+            )
 
-.. code-block:: python
+        # Step 3: Parse PE section headers to get PyRuntim RVA
+        section_rva = parse_pe_section_offset(binary_path, "PyRuntim")
 
-    def find_py_runtime_windows(pid):
-        # Step 1: Try to find the Python executable in memory
-        binary_path, base_address = find_loaded_module(pid, name_contains="python")
-        # Step 2: Fallback to shared pythonXY.dll if executable is not found
-        if binary_path is None:
-            binary_path, base_address = find_loaded_module(pid, name_contains="python3")
-        # Step 3: Parse PE section headers to get .PyRuntime RVA
-        section_rva = parse_pe_section_offset(binary_path, ".PyRuntime")
         # Step 4: Compute PyRuntime address in memory
         return base_address + section_rva
 
-Reading _Py_DebugOffsets
+
+
+RReading _Py_DebugOffsets
 =========================
 
-Once the address of the ``PyRuntime`` structure has been computed in the target
-process, the next step is to read the ``_Py_DebugOffsets`` structure located at
-its beginning.
+Once the address of the ``PyRuntime`` structure has been determined, the next
+step is to read the ``_Py_DebugOffsets`` structure located at the beginning of
+the ``PyRuntime`` block.
 
-This structure contains version-specific field offsets needed to navigate
-interpreter and thread state memory safely.
+This structure provides version-specific field offsets that are needed to
+safely read interpreter and thread state memory. These offsets vary between
+CPython versions and must be checked before use to ensure they are compatible.
 
-To read and validate the debug offsets:
+To read and check the debug offsets, follow these steps:
 
-1. Read the memory at the address of ``PyRuntime``, up to the size of
-   ``_Py_DebugOffsets``. This structure is located at the very start of the
-   ``PyRuntime`` block.
+1. Read memory from the target process starting at the ``PyRuntime`` address,
+   covering the same number of bytes as the ``_Py_DebugOffsets`` structure.
+   This structure is located at the very start of the ``PyRuntime`` memory
+   block. Its layout is defined in CPython’s internal headers and stays the
+   same within a given minor version, but may change in major versions.
 
-2. Verify that the contents of the structure are valid. In particular:
+2. Check that the structure contains valid data:
 
    - The ``cookie`` field must match the expected debug marker.
    - The ``version`` field must match the version of the Python interpreter
-     used by the calling process (i.e., the debugger or controlling runtime).
-   - If either the caller or the target process is running a pre-release version
-     (such as an alpha, beta, or release candidate), then the versions must match
-     exactly.
-   - The ``free_threaded`` flag must match between the caller and the target process.
-
-3. If the structure passes validation, the debugger may now safely use the
-   provided offsets to locate fields in interpreter and thread state structures.
+     used by the debugger.
+   - If either the debugger or the target process is using a pre-release
+     version (for example, an alpha, beta, or release candidate), the versions
+     must match exactly.
+   - The ``free_threaded`` field must have the same value in both the debugger
+     and the target process.
 
-If any validation step fails, the debugger should abort rather than attempting to
-access incompatible memory layouts.
+3. If the structure is valid, the offsets it contains can be used to locate
+   fields in memory. If any check fails, the debugger should stop the operation
+   to avoid reading memory in the wrong format.
 
-An example of how a debugger might read and validate ``_Py_DebugOffsets``:
+The following is an example implementation that reads and checks
+``_Py_DebugOffsets``::
 
-.. code-block:: python
-
-    def read_debug_offsets(pid, py_runtime_addr):
+    def read_debug_offsets(pid: int, py_runtime_addr: int) -> DebugOffsets:
         # Step 1: Read memory from the target process at the PyRuntime address
-        data = read_process_memory(pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE)
+        data = read_process_memory(
+            pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE
+        )
+
         # Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structure
         debug_offsets = parse_debug_offsets(data)
-        # Step 3: Validate compatibility
+
+        # Step 3: Validate the contents of the structure
         if debug_offsets.cookie != EXPECTED_COOKIE:
             raise RuntimeError("Invalid or missing debug cookie")
         if debug_offsets.version != LOCAL_PYTHON_VERSION:
-            raise RuntimeError("Mismatch between caller and target Python versions")
+            raise RuntimeError(
+                "Mismatch between caller and target Python versions"
+            )
         if debug_offsets.free_threaded != LOCAL_FREE_THREADED:
             raise RuntimeError("Mismatch in free-threaded configuration")
+
         return debug_offsets
 
-Locating the Interpreter and Thread State
+
+
+.. warning::
+
+   **Process suspension recommended**
+
+   To avoid race conditions and ensure memory consistency, it is strongly
+   recommended that the target process be suspended before performing any
+   operations that read or write internal interpreter state. The Python runtime
+   may concurrently mutate interpreter data structures—such as creating or
+   destroying threads—during normal execution. This can result in invalid
+   memory reads or writes.
+
+   A debugger may suspend execution by attaching to the process with ``ptrace``
+   or by sending a ``SIGSTOP`` signal. Execution should only be resumed after
+   debugger-side memory operations are complete.
+
+   .. note::
+
+      Some tools, such as profilers or sampling-based debuggers, may operate on
+      a running process without suspension. In such cases, tools must be
+      explicitly designed to handle partially updated or inconsistent memory.
+      For most debugger implementations, suspending the process remains the
+      safest and most robust approach.
+
+
+Locating the interpreter and thread state
 =========================================
 
-After validating the ``_Py_DebugOffsets`` structure, the next step is to locate the
-interpreter and thread state objects within the target process. These structures
-hold essential runtime context and are required for writing debugger control
-information.
+Before code can be injected and executed in a remote Python process, the
+debugger must choose a thread in which to schedule execution. This is necessary
+because the control fields used to perform remote code injection are located in
+the ``_PyRemoteDebuggerSupport`` structure, which is embedded in a
+``PyThreadState`` object. These fields are modified by the debugger to request
+execution of injected scripts.
 
-- The ``PyInterpreterState`` structure represents a Python interpreter instance.
-  Each interpreter holds its own module imports, built-in state, and thread list.
-  Most applications use only one interpreter, but CPython supports creating multiple
-  interpreters in the same process.
+The ``PyThreadState`` structure represents a thread running inside a Python
+interpreter.  It maintains the thread’s evaluation context and contains the
+fields required for debugger coordination.  Locating a valid ``PyThreadState``
+is therefore a key prerequisite for triggering execution remotely.
 
-- The ``PyThreadState`` structure represents a thread running within an interpreter.
-  This is where evaluation state and the control fields used by the debugger live.
+A thread is typically selected based on its role or ID. In most cases, the main
+thread is used, but some tools may target a specific thread by its native
+thread ID. Once the target thread is chosen, the debugger must locate both the
+interpreter and the associated thread state structures in memory.
 
-To inject and run code remotely, the debugger must locate a valid ``PyThreadState``
-to target. Typically, this is the main thread, but in some cases, the debugger may
-want to attach to a specific thread by its native thread ID.
+The relevant internal structures are defined as follows:
 
-To locate a thread:
+- ``PyInterpreterState`` represents an isolated Python interpreter instance.
+  Each interpreter maintains its own set of imported modules, built-in state,
+  and thread state list. Although most Python applications use a single
+  interpreter, CPython supports multiple interpreters in the same process.
 
-1. Use the offset ``runtime_state.interpreters_head`` to find the address of the
-   first interpreter in the ``PyRuntime`` structure. This is the entry point to
-   the list of active interpreters.
+- ``PyThreadState`` represents a thread running within an interpreter. It
+  contains execution state and the control fields used by the debugger.
 
-2. Use the offset ``interpreter_state.threads_main`` to locate the main thread
-   of that interpreter. This is the simplest and most reliable thread to target.
+To locate a thread:
 
-3. Optionally, use ``interpreter_state.threads_head`` to walk the linked list of
-   all threads. For each ``PyThreadState``, compare the ``native_thread_id``
-   field (using ``thread_state.native_thread_id``) to find a specific thread.
+1. Use the offset ``runtime_state.interpreters_head`` to obtain the address of
+   the first interpreter in the ``PyRuntime`` structure. This is the entry point
+   to the linked list of active interpreters.
 
-   This is useful when the debugger allows the user to select which thread to inject into,
-   or when targeting a thread that's actively running.
+2. Use the offset ``interpreter_state.threads_main`` to access the main thread
+   state associated with the selected interpreter. This is typically the most
+   reliable thread to target.
 
-4. Once a valid ``PyThreadState`` is found, record its address. This will be used
-   in the next step to write debugger control fields and schedule execution.
+3. Optionally, use the offset ``interpreter_state.threads_head`` to iterate
+through the linked list of all thread states. Each ``PyThreadState`` structure
+contains a ``native_thread_id`` field, which may be compared to a target thread
+ID to find a specific thread.
 
-An example of locating the main thread:
+1. Once a valid ``PyThreadState`` has been found, its address can be used in
+later steps of the protocol, such as writing debugger control fields and
+scheduling execution.
 
-.. code-block:: python
+The following is an example implementation that locates the main thread state::
 
-    def find_main_thread_state(pid, py_runtime_addr, debug_offsets):
+    def find_main_thread_state(
+        pid: int, py_runtime_addr: int, debug_offsets: DebugOffsets,
+    ) -> int:
         # Step 1: Read interpreters_head from PyRuntime
-        interp_head_ptr = py_runtime_addr + debug_offsets.runtime_state.interpreters_head
+        interp_head_ptr = (
+            py_runtime_addr + debug_offsets.runtime_state.interpreters_head
+        )
         interp_addr = read_pointer(pid, interp_head_ptr)
         if interp_addr == 0:
             raise RuntimeError("No interpreter found in the target process")
+
         # Step 2: Read the threads_main pointer from the interpreter
-        threads_main_ptr = interp_addr + debug_offsets.interpreter_state.threads_main
+        threads_main_ptr = (
+            interp_addr + debug_offsets.interpreter_state.threads_main
+        )
         thread_state_addr = read_pointer(pid, threads_main_ptr)
         if thread_state_addr == 0:
             raise RuntimeError("Main thread state is not available")
-        return thread_state_addr
 
-To locate a specific thread by native thread ID:
+        return thread_state_addr
 
-.. code-block:: python
+The following example demonstrates how to locate a thread by its native thread
+ID::
 
-    def find_thread_by_id(pid, interp_addr, debug_offsets, target_tid):
+    def find_thread_by_id(
+        pid: int,
+        interp_addr: int,
+        debug_offsets: DebugOffsets,
+        target_tid: int,
+    ) -> int:
         # Start at threads_head and walk the linked list
         thread_ptr = read_pointer(
-            pid, interp_addr + debug_offsets.interpreter_state.threads_head
+            pid,
+            interp_addr + debug_offsets.interpreter_state.threads_head
         )
+
         while thread_ptr:
-            native_tid_ptr = thread_ptr + debug_offsets.thread_state.native_thread_id
+            native_tid_ptr = (
+                thread_ptr + debug_offsets.thread_state.native_thread_id
+            )
             native_tid = read_int(pid, native_tid_ptr)
             if native_tid == target_tid:
                 return thread_ptr
-            thread_ptr = read_pointer(pid, thread_ptr + debug_offsets.thread_state.next)
-        raise RuntimeError("Thread with the given ID was not found")
-
-Once a valid thread state has been identified, the debugger can use it to modify
-control fields and request execution in the next stage of the protocol.
-
-Writing Control Information
-===========================
-
-Once a valid thread state has been located, the debugger can write control fields
-that instruct the target process to execute a script at the next safe opportunity.
-
-Each thread state contains a ``_PyRemoteDebuggerSupport`` structure, which is used
-to coordinate communication between the debugger and the interpreter. The debugger
-uses offsets from ``_Py_DebugOffsets`` to locate three key fields:
+            thread_ptr = read_pointer(
+                pid,
+                thread_ptr + debug_offsets.thread_state.next
+            )
 
-- ``debugger_script_path``: A buffer where the debugger writes the full path to
-  a Python source file (``.py``). The file must exist and be readable by the
-  target process.
-
-- ``debugger_pending_call``: An integer flag. When set to ``1``, it signals
-  that a script is ready to be executed.
-
-- ``eval_breaker``: A field checked periodically by the evaluation loop. To
-  notify the interpreter of pending debugger activity, the debugger sets the
-  ``_PY_EVAL_PLEASE_STOP_BIT`` in this field. This causes the interpreter to pause
-  and check for debugger-related actions before continuing with normal execution.
-
-To safely modify these fields, most debuggers should suspend the process before
-writing to memory. This avoids race conditions that may occur if the interpreter
-is actively running.
-
-To perform the injection:
+        raise RuntimeError("Thread with the given ID was not found")
 
-1. Write the script path into the ``debugger_script_path`` buffer.
-2. Set the ``debugger_pending_call`` flag to ``1``.
-3. Read the value of ``eval_breaker``, set the stop bit, and write the updated
-   value back.
 
-An example implementation might look like:
+Once a valid thread state has been located, the debugger can proceed with
+modifying its control fields and scheduling execution, as described in the next
+section.
 
-.. code-block:: python
+Writing control information
+===========================
 
-    def inject_script(pid, thread_state_addr, debug_offsets, script_path):
-        # Base offset to the _PyRemoteDebuggerSupport struct
+Once a valid ``PyThreadState`` structure has been identified, the debugger may
+modify control fields within it to schedule the execution of a specified Python
+script. These control fields are checked periodically by the interpreter, and
+when set correctly, they trigger the execution of remote code at a safe point
+in the evaluation loop.
+
+Each ``PyThreadState`` contains a ``_PyRemoteDebuggerSupport`` structure used
+for communication between the debugger and the interpreter. The locations of
+its fields are defined by the ``_Py_DebugOffsets`` structure and include the
+following:
+
+- ``debugger_script_path``: A fixed-size buffer that holds the full path to a
+   Python source file (``.py``).  This file must be accessible and readable by
+   the target process when execution is triggered.
+
+- ``debugger_pending_call``: An integer flag. Setting this to ``1`` tells the
+   interpreter that a script is ready to be executed.
+
+- ``eval_breaker``: A field checked by the interpreter during execution.
+   Setting bit 5 (``_PY_EVAL_PLEASE_STOP_BIT``, value ``1U << 5``) in this
+   field causes the interpreter to pause and check for debugger activity.
+
+To complete the injection, the debugger must perform the following steps:
+
+1. Write the full script path into the ``debugger_script_path`` buffer.
+2. Set ``debugger_pending_call`` to ``1``.
+3. Read the current value of ``eval_breaker``, set bit 5
+   (``_PY_EVAL_PLEASE_STOP_BIT``), and write the updated value back. This
+   signals the interpreter to check for debugger activity.
+
+The following is an example implementation::
+
+    def inject_script(
+        pid: int,
+        thread_state_addr: int,
+        debug_offsets: DebugOffsets,
+        script_path: str
+    ) -> None:
+        # Compute the base offset of _PyRemoteDebuggerSupport
         support_base = (
             thread_state_addr +
             debug_offsets.debugger_support.remote_debugger_support
         )
-        # 1. Write script path
-        script_path_ptr = support_base + debug_offsets.debugger_support.debugger_script_path
+
+        # Step 1: Write the script path into debugger_script_path
+        script_path_ptr = (
+            support_base +
+            debug_offsets.debugger_support.debugger_script_path
+        )
         write_string(pid, script_path_ptr, script_path)
-        # 2. Set debugger_pending_call = 1
-        pending_ptr = support_base + debug_offsets.debugger_support.debugger_pending_call
+
+        # Step 2: Set debugger_pending_call to 1
+        pending_ptr = (
+            support_base +
+            debug_offsets.debugger_support.debugger_pending_call
+        )
         write_int(pid, pending_ptr, 1)
-        # 3. Set _PY_EVAL_PLEASE_STOP_BIT in eval_breaker
-        eval_breaker_ptr = thread_state_addr + debug_offsets.debugger_support.eval_breaker
+
+        # Step 3: Set _PY_EVAL_PLEASE_STOP_BIT (bit 5, value 1 << 5) in
+        # eval_breaker
+        eval_breaker_ptr = (
+            thread_state_addr +
+            debug_offsets.debugger_support.eval_breaker
+        )
         breaker = read_int(pid, eval_breaker_ptr)
-        # Set the least significant bit (this is _PY_EVAL_PLEASE_STOP_BIT)
-        breaker |= 1
+        breaker |= (1 << 5)
         write_int(pid, eval_breaker_ptr, breaker)
 
-After these writes are complete, the debugger may resume the process (if it was paused).
-The interpreter will check ``eval_breaker`` at the next evaluation checkpoint,
-detect the pending call, and load and execute the specified Python file. The debugger is responsible
-for ensuring that the file remains on disk and readable by the target interpreter
-when it is accessed.
+
+Once these fields are set, the debugger may resume the process (if it was
+suspended).  The interpreter will process the request at the next safe
+evaluation point, load the script from disk, and execute it.
+
+It is the responsibility of the debugger to ensure that the script file remains
+present and accessible to the target process during execution.
+
+.. note::
+
+   Script execution is asynchronous. The script file cannot be deleted
+   immediately after injection. The debugger should wait until the injected
+   script has produced an observable effect before removing the file.
+   This effect depends on what the script is designed to do. For example,
+   a debugger might wait until the remote process connects back to a socket
+   before removing the script. Once such an effect is observed, it is safe to
+   assume the file is no longer needed.
 
 Summary
 =======
 
-To inject and execute a script in a remote Python process:
+To inject and execute a Python script in a remote parocess:
 
-1. Locate the ``PyRuntime`` structure in the target process's memory.
-2. Read and validate the ``_Py_DebugOffsets`` structure at the start of ``PyRuntime``.
+1. Locate the ``PyRuntime`` structure in the target process’s memory.
+2. Read and validate the ``_Py_DebugOffsets`` structure at the beginning of
+   ``PyRuntime``.
 3. Use the offsets to locate a valid ``PyThreadState``.
 4. Write the path to a Python script into ``debugger_script_path``.
-5. Set ``debugger_pending_call = 1``.
-6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in ``eval_breaker``.
-7. Resume the process (if paused). The script will be executed at the next safe eval point.
+5. Set the ``debugger_pending_call`` flag to ``1``.
+6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field.
+7. Resume the process (if suspended). The script will execute at the next safe
+   evaluation point.
+

From 1730c05519a5810cb4ef6383015e896fc95b3207 Mon Sep 17 00:00:00 2001
From: Ivona Stojanovic <stojanovic.i@hotmail.com>
Date: Sun, 20 Apr 2025 20:40:05 +0200
Subject: [PATCH 3/4] fixup! gh-131591: Add remote debugging attachment
 protocol documentation

---
 Doc/howto/remote_debugging.rst | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst
index 2b776c6f823aa2..a2fafa391a7ac4 100644
--- a/Doc/howto/remote_debugging.rst
+++ b/Doc/howto/remote_debugging.rst
@@ -14,7 +14,6 @@ detailed explanation of the underlying protocol, which takes as input the
 executed. This information supports independent reimplementation of the
 protocol, regardless of programming language.
 
-
 .. warning::
 
     The execution of the injected script depends on the interpreter reaching a
@@ -149,10 +148,11 @@ To find the ``PyRuntime`` structure on Windows:
    <https://learn.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-module32next>`_.
 2. Identify the module corresponding to :file:`python.exe` or
    :file:`python{XY}.dll`, where ``X`` and ``Y`` are the major and minor
-   version numbers of the Python version (for example, ``python311.dll``), and
-   record its base address.
-3. Locate the ``PyRuntim`` section. Section names in the PE format are limited
-   to 8 characters.
+   version numbers of the Python version, and record its base address.
+3. Locate the ``PyRuntim`` section. Due to the PE format's 8-character limit
+   on section names (defined as ``IMAGE_SIZEOF_SHORT_NAME``), the original
+   name ``PyRuntime`` is truncated. This section contains the ``PyRuntime``
+   structure.
 4. Retrieve the section’s relative virtual address (RVA) and add it to the base
    address of the module.
 
@@ -171,7 +171,9 @@ The following is an example implementation::
                 pid, name_contains="python3"
             )
 
-        # Step 3: Parse PE section headers to get PyRuntim RVA
+        # Step 3: Parse PE section headers to get the RVA of the PyRuntime
+        # section. The section name appears as "PyRuntim" due to the
+        # 8-character limit defined by the PE format (IMAGE_SIZEOF_SHORT_NAME).
         section_rva = parse_pe_section_offset(binary_path, "PyRuntim")
 
         # Step 4: Compute PyRuntime address in memory
@@ -179,8 +181,8 @@ The following is an example implementation::
 
 
 
-RReading _Py_DebugOffsets
-=========================
+Reading _Py_DebugOffsets
+========================
 
 Once the address of the ``PyRuntime`` structure has been determined, the next
 step is to read the ``_Py_DebugOffsets`` structure located at the beginning of

From c8a9a61ed5fdff8fae0f3a2dc4a12732ee7a0b57 Mon Sep 17 00:00:00 2001
From: Pablo Galindo <pablogsal@gmail.com>
Date: Mon, 21 Apr 2025 21:08:27 +0100
Subject: [PATCH 4/4] Add details per arch

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
---
 Doc/howto/remote_debugging.rst | 68 ++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/Doc/howto/remote_debugging.rst b/Doc/howto/remote_debugging.rst
index a2fafa391a7ac4..37c3572c8a3c31 100644
--- a/Doc/howto/remote_debugging.rst
+++ b/Doc/howto/remote_debugging.rst
@@ -94,6 +94,30 @@ The following is an example implementation::
         return base_address + section_offset
 
 
+On Linux systems, there are two main approaches to read memory from another
+process. The first is through the ``/proc`` filesystem, specifically by reading from
+``/proc/[pid]/mem`` which provides direct access to the process's memory. This
+requires appropriate permissions - either being the same user as the target
+process or having root access. The second approach is using the
+``process_vm_readv()`` system call which provides a more efficient way to copy
+memory between processes. While ptrace's ``PTRACE_PEEKTEXT`` operation can also be
+used to read memory, it is significantly slower as it only reads one word at a
+time and requires multiple context switches between the tracer and tracee
+processes.
+
+For parsing ELF sections, the process involves reading and interpreting the ELF
+file format structures from the binary file on disk. The ELF header contains a
+pointer to the section header table. Each section header contains metadata about
+a section including its name (stored in a separate string table), offset, and
+size. To find a specific section like .PyRuntime, you need to walk through these
+headers and match the section name. The section header then provdes the offset
+where that section exists in the file, which can be used to calculate its
+runtime address when the binary is loaded into memory.
+
+You can read more about the ELF file format in the `ELF specification
+<https://en.wikipedia.org/wiki/Executable_and_Linkable_Format>`_.
+
+
 .. rubric:: macOS (Mach-O)
 
 To find the ``PyRuntime`` structure on macOS:
@@ -134,6 +158,29 @@ The following is an example implementation::
         # Step 5: Compute the PyRuntime address in memory
         return base_address + section_offset
 
+On macOS, accessing another process's memory requires using Mach-O specific APIs
+and file formats. The first step is obtaining a ``task_port`` handle via
+``task_for_pid()``, which provides access to the target process's memory space.
+This handle enables memory operations through APIs like
+``mach_vm_read_overwrite()``.
+
+The process memory can be examined using ``mach_vm_region()`` to scan through the
+virtual memory space, while ``proc_regionfilename()`` helps identify which binary
+files are loaded at each memory region. When the Python binary or library is
+found, its Mach-O headers need to be parsed to locate the ``PyRuntime`` structure.
+
+The Mach-O format organizes code and data into segments and sections. The
+``PyRuntime`` structure lives in a section named ``__PyRuntime`` within the
+``__DATA`` segment. The actual runtime address calculation involves finding the
+``__TEXT`` segment which serves as the binary's base address, then locating the
+``__DATA`` segment containing our target section. The final address is computed by
+combining the base address with the appropriate section offsets from the Mach-O
+headers.
+
+Note that accessing another process's memory on macOS typically requires
+elevated privileges - either root access or special security entitlements
+granted to the debugging process.
+
 
 .. rubric:: Windows (PE)
 
@@ -180,6 +227,27 @@ The following is an example implementation::
         return base_address + section_rva
 
 
+On Windows, accessing another process's memory requires using the Windows API
+functions like ``CreateToolhelp32Snapshot()`` and ``Module32First()/Module32Next()``
+to enumerate loaded modules. The ``OpenProcess()`` function provides a handle to
+access the target process's memory space, enabling memory operations through
+``ReadProcessMemory()``.
+
+The process memory can be examined by enumerating loaded modules to find the
+Python binary or DLL. When found, its PE headers need to be parsed to locate the
+``PyRuntime`` structure.
+
+The PE format organizes code and data into sections. The ``PyRuntime`` structure
+lives in a section named "PyRuntim" (truncated from "PyRuntime" due to PE's
+8-character name limit). The actual runtime address calculation involves finding
+the module's base address from the module entry, then locating our target
+section in the PE headers. The final address is computed by combining the base
+address with the section's virtual address from the PE section headers.
+
+Note that accessing another process's memory on Windows typically requires
+appropriate privileges - either administrative access or the ``SeDebugPrivilege``
+privilege granted to the debugging process.
+
 
 Reading _Py_DebugOffsets
 ========================