From 695218cc710ed56b72676c86a24d042a2ea99c76 Mon Sep 17 00:00:00 2001 From: Sylvain Henry Date: Tue, 2 Jan 2024 14:13:50 +0100 Subject: [PATCH 1/2] Add section about Haskell execution model --- src/Measurement_Observation/haskell_model.rst | 129 ++++++++++++++++++ src/Measurement_Observation/index.rst | 1 + 2 files changed, 130 insertions(+) create mode 100644 src/Measurement_Observation/haskell_model.rst diff --git a/src/Measurement_Observation/haskell_model.rst b/src/Measurement_Observation/haskell_model.rst new file mode 100644 index 00000000..146363c5 --- /dev/null +++ b/src/Measurement_Observation/haskell_model.rst @@ -0,0 +1,129 @@ + +Haskell Compilation and Execution Model +======================================= + + +GHC doesn't use a VM but compiles to native code (when a native code generator +is used). +So we can use tools to profile native code execution: e.g. perf. +However the native code we execute doesn't look exactly like the one produced by +imperative languages (C, C++, Rust, etc.) and this confuses the tools which make +some assumptions. +For example: + +1. we don't have clear procedure boundaries with call/return instruction pairs + (we use tail calls, i.e. jump instructions) + +2. we don't use the so-called C stack in the usual way. I.e. we don't use usual + stack registers (e.g. rsp/rbp on x86-64) as stack registers. + +3. GHC uses its own calling convention + +In addition, lazy evaluation makes control-flow and memory usage difficult to +understand. + +In this chapter we describe the compilation pipeline and the execution model. +For each stage of the pipeline we list what can be measured at runtime. + + +Compilation pipeline overview +----------------------------- + +GHC compiles Haskell codes in several stages, each with their own intermediate +representation: + +Haskell: the functional language we love with its bazillion of extensions. + +Core: a simple and explicitly typed functional language. Types are basically +Haskell types. Type-class dictionaries are passed explicitly as records, +type-applications are used everywhere needed, coercions (proofs that we can +convert a type into another) are first class values, etc. + +STG: a functional language closer to the execution model. It's still typed but +with primitive types. E.g. it tracks if a value is a heap object or an +unboxed word (unboxed = doesn't have its own heap object) but not the Haskell +type of the heap object. Complex forms are lowered to simpler ones (e.g. unboxed +sums are lowered into unboxed tuples); this is called unarisation. + +Additionally every heap allocation is made explicit with a let-binding. For +example instead of ``foo (MkBar x y)`` you have ``let b = MkBar x y in foo b``. +This is called `A-normal form`_. +During code generation from STG we know that all functions are applied to +"simple" arguments only (variables, constants, etc.). + +Following the approach pioneered with "super-combinators", every top-level STG +binding is then compiled into imperative code. The idea is that executing this +imperative code has the same result as interpreting the functional code. For +example, ``let b = MkBar x y in foo b`` is basically compiled to: + +.. code-block:: C + + b <- allocate heap object for `MkBar x y` + evaluate foo + apply `foo` to `b` + +Different GHC backends use different imperative representations. The interpreter +uses ByteCode, the JavaScript backend uses some JavaScript-like form, all other +backends use Cmm. + +Cmm: an imperative language that looks like LLVM IR. It supports expressions, +statements, and it abstracts over machine primops, machine registers, stack +usage, and calling conventions. A pass performs register assignment and stack +layout for the target architecture. Then assembly code (textual) is generated +for the target architecture and an external assembler program generates machine +code (binary) for it. Finally an external linker is used to transform the +resulting code objects into a single executable or library. + +Executables are linked against a runtime system (RTS) that provides primitives +to manage: +- memory: allocation, garbage collection... +- scheduling: thread scheduling, blocking queues... +- IOs: primitives to interact with the operating system +- dynamic code loading: loading and unloading code objects at runtime. + +The RTS itself comes in different flavors: e.g. using multiple OS threads to execute +Haskell code or not. + +All these compilation stages and the RTS provide knobs to tweak the generated +code and the behavior of the runtime system. In particular, some probes can be +optionally inserted in the generated code at various stages to produce different +profiling information at runtime. + + +Execution model overview +------------------------ + +The heap of a Haskell program contain objects that reflect its current execution +state: suspended computations (thunks), partially applied functions (PAP), +values (datacon and their payload), threads (TSO) and their stack, etc. +When the computer executes the compiled code for a top-level STG binding +(starting from ``main``), it asks for more objects to be allocated in the heap, +for some thunks to be reduced to another heap object, or for existing objects to +be used as arguments for function applications. + +The garbage collector is responsible for freeing space in the heap. It runs when +there is not enough space left or depending on other heuristics. GHC provides +several knobs to configure the garbage collector strategies to use and to tweak +their properties. + +The garbage collector used has an impact on profiling. For an extreme example, +if the heap size is configured to be large enough than your program never has to +perfom garbage collection, you'll find that you spend 0% time doing garbage +collection and 100% executing your program ("mutator" time): all the garbage +collection occurs at once implicitly when your program exits. On the other hand, +if you configure the heap to be very small, most of the time can be spent doing +garbage collection even if you haven't changed anything in your program. +It means that **you must be very aware of the RTS options you use when profiling**. + +Similarly, some profiling options have an impact on the size of the heap +objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes +instead of 16 bytes without profiling. Because of this, you may find that your +code with this profiling enabled triggers more garbage collections than your +code without this profiling enabled. **You have to be aware of the compilation +options you use when you make some runtime measurements.** + +Consequences on Profiling +------------------------- + +As a consequence of the Haskell compilation pipeline and of the Haskell execution model +we can measure many different things at different levels. diff --git a/src/Measurement_Observation/index.rst b/src/Measurement_Observation/index.rst index 4acae535..2b3f00b0 100644 --- a/src/Measurement_Observation/index.rst +++ b/src/Measurement_Observation/index.rst @@ -6,6 +6,7 @@ Measurement, Profiling, and Observation :name: Heap_Ghc the_recipe + haskell_model Heap_Ghc/index Heap_Third/index Measurement_Ghc/index From 60d9d86d451f8f8e84997720b55ed986f5a483f0 Mon Sep 17 00:00:00 2001 From: Sylvain Henry Date: Fri, 15 Mar 2024 15:08:36 +0100 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: Jeffrey Young --- src/Measurement_Observation/haskell_model.rst | 30 +++++++++---------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/src/Measurement_Observation/haskell_model.rst b/src/Measurement_Observation/haskell_model.rst index 146363c5..bb8a08b9 100644 --- a/src/Measurement_Observation/haskell_model.rst +++ b/src/Measurement_Observation/haskell_model.rst @@ -12,25 +12,24 @@ some assumptions. For example: 1. we don't have clear procedure boundaries with call/return instruction pairs - (we use tail calls, i.e. jump instructions) + (we use tail calls, i.e. jump instructions). 2. we don't use the so-called C stack in the usual way. I.e. we don't use usual stack registers (e.g. rsp/rbp on x86-64) as stack registers. -3. GHC uses its own calling convention +3. GHC uses its own calling convention. In addition, lazy evaluation makes control-flow and memory usage difficult to understand. -In this chapter we describe the compilation pipeline and the execution model. +In this chapter we describe the compilation pipeline and the execution model of GHC Haskell. For each stage of the pipeline we list what can be measured at runtime. Compilation pipeline overview ----------------------------- -GHC compiles Haskell codes in several stages, each with their own intermediate -representation: +GHC's compilation pipeline is roughly composed of four stages and intermediate representations. They are: Haskell: the functional language we love with its bazillion of extensions. @@ -41,7 +40,7 @@ convert a type into another) are first class values, etc. STG: a functional language closer to the execution model. It's still typed but with primitive types. E.g. it tracks if a value is a heap object or an -unboxed word (unboxed = doesn't have its own heap object) but not the Haskell +:term:`unboxed` word but not the Haskell type of the heap object. Complex forms are lowered to simpler ones (e.g. unboxed sums are lowered into unboxed tuples); this is called unarisation. @@ -54,7 +53,7 @@ During code generation from STG we know that all functions are applied to Following the approach pioneered with "super-combinators", every top-level STG binding is then compiled into imperative code. The idea is that executing this imperative code has the same result as interpreting the functional code. For -example, ``let b = MkBar x y in foo b`` is basically compiled to: +example, ``let b = MkBar x y in foo b`` compiles to: .. code-block:: C @@ -62,9 +61,9 @@ example, ``let b = MkBar x y in foo b`` is basically compiled to: evaluate foo apply `foo` to `b` -Different GHC backends use different imperative representations. The interpreter -uses ByteCode, the JavaScript backend uses some JavaScript-like form, all other -backends use Cmm. +STG is the last stage in the pipeline that is shared between all of GHC's backends; different backends use different imperative representations for their exact needs. For example, the interpreter +uses ByteCode, the JavaScript backend uses a subset of JavaScript, all other +backends use Cmm (pronounced *C minus minus*). Cmm: an imperative language that looks like LLVM IR. It supports expressions, statements, and it abstracts over machine primops, machine registers, stack @@ -81,13 +80,12 @@ to manage: - IOs: primitives to interact with the operating system - dynamic code loading: loading and unloading code objects at runtime. -The RTS itself comes in different flavors: e.g. using multiple OS threads to execute -Haskell code or not. +Note that the RTS itself comes in different flavors. For example, a threaded RTS which uses multiple OS threads to execute Haskell code or not. All these compilation stages and the RTS provide knobs to tweak the generated code and the behavior of the runtime system. In particular, some probes can be optionally inserted in the generated code at various stages to produce different -profiling information at runtime. +profiling information at runtime. We refer interested those interested in these tweaks to the :userGuide:`User Guide `. Execution model overview @@ -103,17 +101,17 @@ be used as arguments for function applications. The garbage collector is responsible for freeing space in the heap. It runs when there is not enough space left or depending on other heuristics. GHC provides -several knobs to configure the garbage collector strategies to use and to tweak +several :userGuide:`knobs ` to configure the garbage collector strategies to use and to tweak their properties. The garbage collector used has an impact on profiling. For an extreme example, if the heap size is configured to be large enough than your program never has to perfom garbage collection, you'll find that you spend 0% time doing garbage -collection and 100% executing your program ("mutator" time): all the garbage +collection and 100% executing your program ("mutator" time): all garbage collection occurs at once implicitly when your program exits. On the other hand, if you configure the heap to be very small, most of the time can be spent doing garbage collection even if you haven't changed anything in your program. -It means that **you must be very aware of the RTS options you use when profiling**. +Thus, **you must be very aware of the RTS options you use when profiling**. Similarly, some profiling options have an impact on the size of the heap objects: e.g. the heap object that represents `10 :: Int64` uses 24 bytes