Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions data/module-4/part-3/analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ title: 'Cryptanalysis'
hidden: false
---

_Cryptanalysis_ is the study of cryptographic algorithms and methods. The basic goal is to _break_ cryptographic algorithms in one way or another. For example, the attacker may have acquired knowledge of some cryptotexts and the corresponding plaintexts. Then he tries to use this information for finding the used key. In another setting, we assume that the attacker has only some cryptotexts based on which the key should be found. It is always assumed that the attacker knows all details of the encryption and decryption processes, only the secret key is not known. If this cannot be assumed then the encryption method is considered to be very weak.
_Cryptanalysis_ is the study of cryptographic algorithms and methods. The basic goal is to _break_ cryptographic algorithms in one way or another. For example, the attacker may have acquired knowledge of some ciphertexts and the corresponding plaintexts. Then he tries to use this information for finding the used key. In another setting, we assume that the attacker has only some ciphertexts based on which the key should be found. It is always assumed that the attacker knows all details of the encryption and decryption processes, only the secret key is not known. If this cannot be assumed then the encryption method is considered to be very weak.

Let us take a look at the substitution cipher. Now every occurrence of the letter ’e’ in the plaintext is encrypted as the same letter in the cryptotext, lets say ’Å’. Because ’e’ is the most common letter in the English language, ’Å’ should be one of the most common letters of the cryptotext. So we may assume that the common letters in the cryptotext correspond to common letters in the language of the plaintext and use this information for making educated guesses about how each letter is encrypted. This cryptanalytic approach is called _frequency analysis_.
Let us take a look at the substitution cipher. Now every occurrence of the letter ’e’ in the plaintext is encrypted as the same letter in the ciphertext, lets say ’Å’. Because ’e’ is the most common letter in the English language, ’Å’ should be one of the most common letters of the ciphertext. So we may assume that the common letters in the ciphertext correspond to common letters in the language of the plaintext and use this information for making educated guesses about how each letter is encrypted. This cryptanalytic approach is called _frequency analysis_.

<programming-exercise name="Frequency attack" tmcname="part3-04.frequency" course="Advanced Topics">

Expand All @@ -27,17 +27,17 @@ Hint: `string.ascii_lowercase` and `islower()` can be handy.

</programming-exercise>

The same approach cannot be used against OTP. If the key is chosen randomly, both 0 and 1 appear as often in the cryptotext, in average. This happens regardless of how common 1 is in the plaintext. In fact, any cryptotext could result from any plaintext with a suitable key. Assume the known cryptotext bit is C. Now the corresponding plaintext bit could be either C (which happens if the key bit is zero) or 1-C (which happens if the key bit is one). Therefore, knowing the cryptotext does not provide any new information about the plaintext to the attacker. This means the OTP is _unconditionally_ secure.
The same approach cannot be used against OTP. If the key is chosen randomly, both 0 and 1 appear as often in the ciphertext, in average. This happens regardless of how common 1 is in the plaintext. In fact, any ciphertext could result from any plaintext with a suitable key. Assume the known ciphertext bit is C. Now the corresponding plaintext bit could be either C (which happens if the key bit is zero) or 1-C (which happens if the key bit is one). Therefore, knowing the ciphertext does not provide any new information about the plaintext to the attacker. This means the OTP is _unconditionally_ secure.

For the OTP, we applied the setting where the attacker knows only the cryptotext. If the attacker knows both the cryptotext and the corresponding plaintext, then he can easily recover the used key. However, breaking OTP in this setting is not relevant because the recovered key is not used to encrypt anything else than the plaintext that the attacker already knows.
For the OTP, we applied the setting where the attacker knows only the ciphertext. If the attacker knows both the ciphertext and the corresponding plaintext, then he can easily recover the used key. However, breaking OTP in this setting is not relevant because the recovered key is not used to encrypt anything else than the plaintext that the attacker already knows.

## Modes of operation

A block cipher, like AES, is used to encrypt blocks of a certain size. What should be done if the message is longer than that size?

The simplest way to encrypt a long message is to take the first block, encrypt it using the key to produce the first cryptotext block, then take the second block, encrypt it using the same key to produce the second cryptotext block etc. This approach is one of the [block cipher modes of operation](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation), called _Electronic Codebook_ (ECB). It is the simplest way but often could be broken by frequency analysis. The attacker notices if you encrypt the same plaintext block twice because the two cryptotexts are also the same. This happens, for example, if plaintext contains some commonly used short pattern, like 'OK'.
The simplest way to encrypt a long message is to take the first block, encrypt it using the key to produce the first ciphertext block, then take the second block, encrypt it using the same key to produce the second ciphertext block etc. This approach is one of the [block cipher modes of operation](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation), called _Electronic Codebook_ (ECB). It is the simplest way but often could be broken by frequency analysis. The attacker notices if you encrypt the same plaintext block twice because the two ciphertexts are also the same. This happens, for example, if plaintext contains some commonly used short pattern, like 'OK'.

Other modes of operation avoid this problem by using extra input in addition to plaintext and key. For instance, previously computed cryptotext blocks or _counters_ could be used for this purpose. Our HTTPS example (`TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384_256 bit keys,TLS 1.2`) makes use of AES algorithm in _Galois/Counter Mode_. The key length is 256 bits.
Other modes of operation avoid this problem by using extra input in addition to plaintext and key. For instance, previously computed ciphertext blocks or _counters_ could be used for this purpose. Our HTTPS example (`TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384_256 bit keys,TLS 1.2`) makes use of AES algorithm in _Galois/Counter Mode_. The key length is 256 bits.

The next 3 exercises should be done in order.

Expand Down Expand Up @@ -139,7 +139,7 @@ Hint: you will probably find the xor helper function helpful. Do not forget add

Padding oracle attack shows that a tiny amount of additional information can be enough to break the cipher.

Earlier versions of CBC decipher implementations would return an error message to the sender if the padding of the sent message was correct.
Earlier versions of CBC decipher implementations would return an error message to the sender if the padding of the sent message was incorrect.
This information, assuming that we have access to submit our own messages to the decipher, is enough to break CBC encryption!
Furthermore, the breakage doesn't depend on the underlying block cipher.

Expand Down
2 changes: 1 addition & 1 deletion data/module-4/part-3/asymmetric.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ and $q$ to anybody.
Let us assume that Alice has given her public key to Bob. Bob can now encrypt
any &quot;message&quot; $x$ that is (encoded as) an integer between 0 and
$n$ by calculating $x^e \mod n = y$ .Bob would send the
result to Alice. Alice can decrypt the cryptotext $y$ because she knows
result to Alice. Alice can decrypt the ciphertext $y$ because she knows
the secret decrypting exponent $d$. She calculates $y^d \mod n
= x$.

Expand Down
22 changes: 11 additions & 11 deletions data/module-4/part-3/symmetric.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This is not a false impression but there is quite much more in cryptography as w

But let us start with the basic setting and define some basic terms.

We call the original message that is supposed to be kept hidden the _plaintext_. The idea of ciphering is to use some _encryption function_ and some _encryption key_ to get the _cryptotext._ That is the message in the hidden form. For the reverse direction, we need a _decrypting function_, a method that is used together with a _decryption key_ in order to retrieve the original plaintext.
We call the original message that is supposed to be kept hidden the _plaintext_. The idea of ciphering is to use some _encryption function_ and some _encryption key_ to get the _ciphertext_. That is the message in the hidden form. For the reverse direction, we need a _decrypting function_, a method that is used together with a _decryption key_ in order to retrieve the original plaintext.

Encryption is used to protect _confidentiality_ of the data.

Expand Down Expand Up @@ -70,10 +70,10 @@ Our HTTPS example (`TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384_256 bit keys,TLS 1.2`)

<programming-exercise name="Break the hash" tmcname="part3-01.password" course="Advanced Topics">

In this exercise you are given a hash and a list of candidate passwords, and
your task is to write a password guesser that finds the password in the candidates that was used to generate the hash.
In this exercise you are given a hash and a list of candidate passwords.
Your task is to write a password guesser which finds which password in the candidates list was the one that was used to generate the hash.

The hash follows a common format used for storing hashed password
The hash follows a common format used for storing hashed password:
```
procotol$salt$hash
```
Expand All @@ -99,7 +99,7 @@ There are nowadays also encryption methods that are _not_ symmetric, i.e., being

Symmetric encryption methods are still useful in many situations, and they are typically much faster than asymmetric ones. Therefore, symmetric encryption is in wide use, and new symmetric methods are developed. One-time pad (OTP) and AES are examples of symmetric encrypting schemes.

One of the oldest encryption methods is substituting every instance of a letter with some other letter. A cipher like this is called a _substitution cipher_. An example is the _CAESAR_ cipher which is an encryption algorithm where you get the cryptotext by rotating every letter in the plaintext three positions forward in the alphabet. Decryption is done by rotating every letter of the cryptotext three positions backwards.
One of the oldest encryption methods is substituting every instance of a letter with some other letter. A cipher like this is called a _substitution cipher_. An example is the _CAESAR_ cipher which is an encryption algorithm where you get the ciphertext by rotating every letter in the plaintext three positions forward in the alphabet. Decryption is done by rotating every letter of the cipherext three positions backwards.

<quiz id="70a22e3e-d564-5771-b769-3e5f49eaed60"></quiz>

Expand All @@ -109,19 +109,18 @@ One of the oldest encryption methods is substituting every instance of a letter

Implement a substitution cipher: complete `encrypt` and `decrypt` functions in `src/substitution.py`.

Both functions are byte arrays and should output an byte array as an output.
The key is an array such that `key[c]` is equal to the _encrypted_ value of `c`.
Both functions take two byte array parameters and each should also return a byte array. The key is an array such that `key[c]` is equal to the _encrypted_ value of `c`.

</programming-exercise>


One-time pad (OTP) is one of the simplest encryption methods. To encrypt a message of, say, 140 bits you need a secret key of 140 bits. You compute the XOR of each message bit with the corresponding key bit, and you get 140 cryptotext bits,
One-time pad (OTP) is one of the simplest encryption methods. To encrypt a message of, say, 140 bits you need a secret key of 140 bits. You compute the XOR of each message bit with the corresponding key bit, and you get 140 ciphertext bits,

$$
C = M \oplus K.
$$

The decrypting process is exactly the same as the encryption process. If you know the secret key you can decrypt the message by XORing bit-by-bit the cryptotext message and the key,
The decrypting process is exactly the same as the encryption process. If you know the secret key you can decrypt the message by XORing bit-by-bit the ciphertext message and the key,

$$
M = C \oplus K.
Expand All @@ -144,7 +143,8 @@ The only downside of the OTP is that the key must be as long as the actual plain
<programming-exercise name="Repeating pads" tmcname="part3-03.xorpad" course="Advanced Topics">

Implement a xorpad cipher: complete `encrypt` and `decrypt` functions in `src/xorpad.py`.
Both functions are byte arrays and should output an byte array as an output.

Both functions take two byte array parameters and each should also return a byte array.

The pad can be significantly shorter than the message. In such a case you should repeat the pad as long as needed.

Expand All @@ -153,6 +153,6 @@ deduce the pad if a short part of message is known in advance (can you figure ou

</programming-exercise>

Advanced Encryption Standard ([AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard)) is a family of modern _block ciphers_. AES-256 has key size of 256 bits and it turns 128-bit plaintext blocks into 128-bit cryptotext blocks, and vice versa. This algorithm is quite fast even when implemented in software and considered secure enough for almost all uses. Many modern processors provide hardware support for AES.
Advanced Encryption Standard ([AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard)) is a family of modern _block ciphers_. AES-256 has key size of 256 bits and it turns 128-bit plaintext blocks into 128-bit ciphertext blocks, and vice versa. This algorithm is quite fast even when implemented in software and considered secure enough for almost all uses. Many modern processors provide hardware support for AES.

<quiz id="ae028b69-e979-57f7-8aa6-c8c5d3822000"></quiz>