quarkslab
diff --git a/‎tutorials/graph-only-diffing.rst
Lines changed: 208 additions & 0 deletions b/‎tutorials/graph-only-diffing.rst
Lines changed: 208 additions & 0 deletions
diff --git a/‎tutorials/res/diff-cfg.png
59.4 KB b/‎tutorials/res/diff-cfg.png
59.4 KB
diff --git a/‎tutorials/res/diff-dispatcher.png
79.9 KB b/‎tutorials/res/diff-dispatcher.png
79.9 KB
diff --git a/‎tutorials/res/diff-signedness.png
19.5 KB b/‎tutorials/res/diff-signedness.png
19.5 KB
diff --git a/‎tutorials/res/fixed.evm
Lines changed: 15 additions & 0 deletions b/‎tutorials/res/fixed.evm
Lines changed: 15 additions & 0 deletions
diff --git a/‎tutorials/res/fixed.sol
Lines changed: 16 additions & 0 deletions b/‎tutorials/res/fixed.sol
Lines changed: 16 additions & 0 deletions
diff --git a/‎tutorials/res/stack-balance-weight-epsilon.png
37 KB b/‎tutorials/res/stack-balance-weight-epsilon.png
37 KB
diff --git a/‎tutorials/res/stack-balance-weight-tradeoff.png
36 KB b/‎tutorials/res/stack-balance-weight-tradeoff.png
36 KB
diff --git a/‎tutorials/res/tradeoff-epsilon.png
39.2 KB b/‎tutorials/res/tradeoff-epsilon.png
39.2 KB
diff --git a/‎tutorials/res/vulnerable.evm
Lines changed: 16 additions & 0 deletions b/‎tutorials/res/vulnerable.evm
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,208 @@
+Diffing Graphs only
+===================
+
+QBinDiff is able to perform graph matching, whether this graph is a Control Flow Graph (CFG),
+a Call Graph (CG), or something completely unrelated.
+We demonstrated this in our `introductory blog post <https://blog.quarkslab.com/qbindiff-a-modular-diffing-toolkit.html>`_
+with the protein-protein interaction (PPI) networks of different species in bioinformatics.
+
+In this tutorial, we will focus our attention on the CFG of EVM smart contracts.
+
+Motivation
+----------
+
+The Ethereum Virtual Machine (EVM) is a RISC stack-based architecture.
+It is used by Ethereum and other compatible chains to execute smart contracts,
+the core programs of decentralized applications on these platforms.
+
+In addition to the lack of native support from tools such as QBinDiff, BinExport, and Quokka,
+the EVM bytecode lacks an explicit structure to identify functions:
+all the control flow is performed with only conditional and unconditional jumps (``JUMPI``, ``JUMP``),
+that read the destination address on the stack.
+This design leads to particular patterns in the bytecode, for example:
+
+* the start of the bytecode contains a dispatcher, acting like a giant switch statement,
+  that is responsible for jumping into the called external/public function;
+* to call internal functions and other forms of duplicated code,
+  the caller pushes a return address onto the stack before executing a jump instruction.
+
+However, several tools let you recover the CFG of the smart contract
+(some even detect functions with varying degrees of success).
+From this information alone, this tutorial will guide you with diffing an example smart contract.
+
+How to diff
+-----------
+
+In this tutorial, we will diff two versions of an example smart contract.
+One allows its user's balance to go negative but not the other.
+
++---------------------------------------+----------------------------------+
+| ``vulnerable.sol``                    | ``fixed.sol``                    |
++=======================================+==================================+
+|.. literalinclude:: res/vulnerable.sol |.. literalinclude:: res/fixed.sol |
++---------------------------------------+----------------------------------+
+|.. literalinclude:: res/vulnerable.evm |.. literalinclude:: res/fixed.evm |
++---------------------------------------+----------------------------------+
+
+To this end, we will use QBinDiff's ``DiGraphDiffer`` to compare their CFG.
+
+Generate the graphs
+^^^^^^^^^^^^^^^^^^^
+
+The first step is to recover the CFG from the bytecode of the smart contracts.
+While not trivial, this has been solved by several tools already.
+We recommend `EtherSolve <https://github.com/SeUniVr/EtherSolve/>`_ (java)
+or `vandal <https://github.com/usyd-blockchain/vandal/>`_ (python).
+
+.. code-block:: sh
+
+  java -jar EtherSolve.jar -r -j vulnerable.evm
+
+The command above generates a file named ``Analysis_<datetime>.json``.
+In this file, the CFG can be found under the ``runtimeCfg`` field.
+Note that edges are stored under ``runtimeCfg.successors`` (and sometimes have duplicate entries).
+
+QBinDiff's differ expects ``networkx.DiGraph`` as inputs, so we will need to adapt the data a little bit:
+
+.. code-block:: python
+
+  def process_ethersolve(analysis: dict[str, Any]) -> networkx.DiGraph:
+      # Filter the desired attributes and set the node ID as the basic block's offset
+      nodes = [
+          {
+              "id": n["offset"],
+              "length": n["length"],
+              "type": n["type"],
+              "stack_balance": n["stackBalance"],
+              "bytecode": n["bytecodeHex"],
+              "opcodes": n["parsedOpcodes"],
+          } for n in analysis["runtimeCfg"]["nodes"]
+      ]
+
+      # Generate an edge for each successor of each node
+      links = [
+          [{"source": e["from"], "target": t} for t in e["to"]]
+          for e in analysis["runtimeCfg"]["successors"]
+      ]
+      # Flatten the list
+      links = [item for sublist in links for item in sublist]
+
+      # Create the networkx.DiGraph
+      nx_node_link = {"directed": True, "multigraph": False, "nodes": nodes, "links": links}
+      return networkx.node_link_graph(nx_node_link)
+ 
+
+(optional) Write heuristics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When matching nodes, we can help the differ by setting an initial similarity score between each pair of nodes.
+These scores are gathered in the similarity matrix, initialized to all 1,
+meaning every node is initially believed to be similar to all nodes.
+
+If you have access to some heuristic for similarity between nodes,
+you can add a prepass that will be executed before matching nodes to alter the similarity matrix.
+
+For example, we have access to the stack balance of each basic block.
+This value indicates how many words the basic block pushes or pops from the stack.
+Intuitively, similar blocks should have the same stack balance.
+
+You can find more information `here <../qbindiff/doc/source/api/differ.html#qbindiff.Differ.register_prepass>`_
+on how to create a prepass.
+
+In this example, we first arrange nodes by stack balance in each graph,
+then reduce the similarity of nodes that do not share the same stack balance.
+
+Note that the nodes' IDs are their offset, and do not correspond to the row or column in the similarity matrix.
+The correspondance is given by the ``primary_n2i`` and ``secondary_n2i`` mappings.
+
+.. code-block:: python
+
+  def prepass_stack_balance(
+      sim_matrix: SimMatrix,
+      primary: qbindiff.GenericGraph,
+      secondary: qbindiff.GenericGraph,
+      primary_n2i: dict[int, int],
+      secondary_n2i: dict[int, int],
+      **kwargs,
+  ) -> None:
+      # Arrange nodes indices by stack balance
+
+      ## Primary node indices by stack balance
+      primary_index: dict[int, list[int]] = {}
+      ## Secondary node indices by stack balance
+      secondary_index: dict[int, list[int]] = {}
+  
+      ## Populate primary_index and secondary_index
+      for graph, n2i, index in (
+          (primary, primary_n2i, primary_index),
+          (secondary, secondary_n2i, secondary_index),
+      ):
+          for node_id in graph.nodes():
+              node = graph.nodes[node_id]
+              balance = node["stack_balance"]
+              if balance not in index:
+                  index[balance] = []
+              index[balance].append(n2i[node_id])
+  
+      # Reduce the similarity of nodes that do not share the same stack balance by 60%
+      for primary_balance, primary_indices in primary_index.items():
+          for secondary_balance, secondary_indices in secondary_index.items():
+              if primary_balance == secondary_balance:
+                  continue
+              for i in primary_indices:
+                  sim_matrix[i, secondary_indices] *= 0.4
+
+Perform the match
+^^^^^^^^^^^^^^^^^
+
+Once you have the CFG in a ``networkx.DiGraph`` object,
+and have optionally written some prepasses, performing the mapping is simple:
+
+.. code-block:: python
+
+  differ = qbindiff.DiGraphDiffer(
+      primary_cfg,
+      secondary_cfg,
+      sparsity_ratio=0,
+      tradeoff=0.5,
+      epsilon=0.1,
+  )
+  differ.register_prepass(prepass_stack_balance)  # optional
+  mapping = differ.compute_matching()
+
+You can experiment with the tradeoff and epsilon values,
+depending on the nature of the diffing performed.
+As general guidelines:
+
+* ``tradeoff`` gives more weight to the topology when close to 0,
+  and more weight to the similarity when close to 1.
+  It should be set strictly between 0 and 1.
+  The better your heuristics, the higher its value.
+* ``epsilon`` controls the convergence speed.
+  It should not be set to 0, and be as close to 1 as you can afford to wait.
+  For this simple example, a conservative low value is not an issue.
+* you should adjust how much your prepasses affect the similarity matrix,
+  depending on the quality of your heuristics.
+
+For this example, we performed an exhaustive search of these parameters,
+when compared to a ground truth matching.
+In these maps, ``epsilon`` and ``tradeoff`` correspond to the above parameters,
+while ``stack balance weight`` controls how much the stack balance prepass impacts the similarity matrix.
+
++-------------------------------------------------+--------------------------------------------------+-------------------------------------+
+| .. image:: res/stack-balance-weight-epsilon.png | .. image:: res/stack-balance-weight-tradeoff.png | .. image:: res/tradeoff-epsilon.png |
++-------------------------------------------------+--------------------------------------------------+-------------------------------------+
+
+Process the result
+^^^^^^^^^^^^^^^^^^
+
+Now that you have a mapping between nodes of the primary and secondary graphs,
+you can process it however you like, for example to compute similarity score.
+
+Here we show a visualization of the resulting diff, revealing interesting aspects of the modification:
+
++------------------------------------+-----------------------------+------------------------------------+
+| From signed to unsigned operations | CFG rewiring                | Dispatcher update                  |
++====================================+=============================+====================================+
+| .. image:: res/diff-signedness.png | .. image:: res/diff-cfg.png | .. image:: res/diff-dispatcher.png |
++------------------------------------+-----------------------------+------------------------------------+
@@ -0,0 +1,15 @@
+608060405234801561001057600080fd5b50600436106100365760003560
+e01c806327e235e31461003b578063412664ae1461006d575b600080fd5b
+61005b6100493660046100f3565b60006020819052908152604090205481
+565b60405190815260200160405180910390f35b61008061007b36600461
+0115565b610082565b005b33600090815260208190526040812080548392
+906100a1908490610155565b90915550506001600160a01b038216600090
+815260208190526040812080548392906100ce90849061016e565b909155
+50505050565b80356001600160a01b03811681146100ee57600080fd5b91
+9050565b60006020828403121561010557600080fd5b61010e826100d756
+5b9392505050565b6000806040838503121561012857600080fd5b610131
+836100d7565b946020939093013593505050565b634e487b7160e01b6000
+52601160045260246000fd5b818103818111156101685761016861013f56
+5b92915050565b808201808211156101685761016861013f56fea2646970
+66735822122024ea090b39258ab97fbe93f3646eb22d1d2cc2aedc4b74e1
+c3861562ad940d0364736f6c63430008190033
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: MIT
+
+pragma solidity 0.8.25;
+
+contract SimpleToken {
+    mapping(address => uint) public balances;
+    
+    constructor() {
+        balances[msg.sender] += 1000e18;
+    }
+    
+    function sendToken(address _recipient, uint _amount) public {
+        balances[msg.sender] -= _amount;
+        balances[_recipient] += _amount;
+    }   
+}
@@ -0,0 +1,16 @@
+608060405234801561001057600080fd5b50600436106100365760003560
+e01c80630f7428b91461003b57806327e235e314610050575b600080fd5b
+61004e6100493660046100f3565b610082565b005b61007061005e366004
+61011d565b60006020819052908152604090205481565b60405190815260
+200160405180910390f35b33600090815260208190526040812080548392
+906100a1908490610155565b90915550506001600160a01b038216600090
+815260208190526040812080548392906100ce90849061017c565b909155
+50505050565b80356001600160a01b03811681146100ee57600080fd5b91
+9050565b6000806040838503121561010657600080fd5b61010f836100d7
+565b946020939093013593505050565b60006020828403121561012f5760
+0080fd5b610138826100d7565b9392505050565b634e487b7160e01b6000
+52601160045260246000fd5b818103600083128015838313168383128216
+17156101755761017561013f565b5092915050565b808201828112600083
+128015821682158216171561019c5761019c61013f565b50509291505056
+fea264697066735822122078995ea6ebd8bd5d1333cde4da1d4ade3eee62
+af048e42232dbc3ed6cd682aba64736f6c63430008190033