-
Notifications
You must be signed in to change notification settings - Fork 184
/
Copy pathlectures1-17.tex
1388 lines (814 loc) · 84.3 KB
/
lectures1-17.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[psamsfonts]{amsart}
%-------Packages---------
\usepackage{amssymb,amsfonts}
\usepackage{enumerate}
\usepackage[margin=1in]{geometry}
\bibliographystyle{plain}
\voffset = -10pt
\headheight = 0pt
\topmargin = -20pt
\textheight = 690pt
%--------Meta Data: Fill in your info------
\title{6.857 \\
Network and Computer Security \\
Lectures 1-17}
\author{Lecturer: Ronald Rivest\\
Scribe: John Wang}
\begin{document}
\newpage
\Large{Lecture 1}
\maketitle
\section{Introduction to Cryptography}
\begin{def}
\emph{Security} is computing or communicating in the presence of adversaries.
\end{def}
The presence of adversaries make security interesting, because you're working against the cleverness of other people. You are in the worst case scenario. Note that this is different than error correcting codes, for instance, where there is no adversary.
\section{Security Policies}
\begin{def}
\emph{Security policy} describes what is being protected and what activities or events should be protected.
\end{def}
If you don't have a policy, then you don't have security, because nothing is defined yet. Security policy is usually in terms of:
\begin{itemize}
\item Principals (actors or participants).
\item Permissible (or inpermissible) actions or operations.
\item Classes of objects.
\end{itemize}
\subsection{Examples}
Security Policy: ``Each registered voter may vote at most once.'' Principals are the voters and permissible actions are voting at most once.
Security Policy: ``Only an administrator can modify file $x$.''
Security Policy: ``The recipient of an email should be able to authenticate the sender.''
\subsection{Types of Policies}
\begin{itemize}
\item C - confidentiality policies. Prevents unauthorized disclosure.
\item I - integrity policies. Information should not be modifiable in an unauthorized manner.
\item A - availability policies. Systems should remain available.
\end{itemize}
\section{Security Mechanisms}
\begin{def}
\emph{Security mechanisms} are means for achieving security policies.
\end{def}
Examples: smart card for voters, password for sysadmin, digital signature for email, physical security.
Security mechanisms are usually one of two forms:
\begin{itemize}
\item Prevention: keeps policy from being violated.
\item Detection: discovers if the policy has been violated.
\end{itemize}
If the detection mechanism goes off, then what? You must have a \emph{recovery mechanism} for getting the system back to a good state. Notice that preventation and detection are not entirely unrelated. Detection system may involve deterrence, which helps prevents attacks.
\section{Adversaries}
The adversaries may be outsider or insider (ex: voter may want to be able to vote twice). Need to figure out who the adversary is. Note that there can be many adversaries.
What do the adversaries know? Usually, you assume that the adversary knows the engineering of the system and the security mechanisms. Security analysis is usually scenario based which is different depending on the assumptions one makes about the adversary and what he/she knows.
What resources does the adversary have? Does the adversary have a supercomputer or the ability to corrupt insiders or mathematical knowledge?
What are his motivations? Is the adversary economically motivated or is he just evil? Sometimes it is useful to assume that adversaries are rational economic players.
The best systems are those which are robust even in the worst-case scenarios. You want to have a system that is secure even when you have a perfect adversary.
\section{Vulnerability}
\begin{def}
\emph{Vulnerability} is a weakness in the system that can be exploited by the adversary.
\end{def}
Examples: poor password, buffer overflows, etc.
There is a distinction between the system as designed and the system as implemented. Implementations tend to have bugs, which could potentially introduce security vulnerabilities. Even if the design is perfect, the implemented system could weaken security.
There are \emph{threats} to exploit vulnerability and \emph{risk} that the vulnerability will be exploited.
\newpage
\Large{Lecture 2}
\maketitle
\section{Introduction to Cryptography}
\subsection{Security Mechanisms}
\begin{itemize}
\item \emph{Identification} of principals (username).
\item \emph{Authentication} of principals (password, biometric). Means by which one can prove identity.
\item \emph{Authorization}. Does the individual have permission to perform particular actions?
\item \emph{Physical Protection}.
\item \emph{Cryptography}. Using number theory for protection.
\item \emph{Economics}. Sometimes can assume that attackers are influenced by economic motivations.
\item \emph{Deception}. Try to deceive adversary. For example: honeypots are fake machines.
\item \emph{Randomness and Unpredictability}. Adversary is at a disadvantage because principals have done something that is unpredictable. Not having a good source of randomness has turned out to very bad.
\end{itemize}
\subsection{Principles}
\begin{itemize}
\item \emph{Be skeptical}. People make mistakes.
\item \emph{Be paranoid}. Things are likely not to be secure unless someone really convinces you.
\item \emph{Think adversarially}. What can an adversary do against this system? What could possibly go wrong with this system?
\item \emph{Think about user-supplied input}. User supplied input can be anything, so make sure to handle it properly.
\item \emph{Don't aim for perfection}. Bugs are a fact of life, so don't aim to try to be perfect.
\item \emph{Tradeoffs between cost and security}. To halve risk, does it double the cost?
\item \emph{Be prepared for losses}. Try to have the ability to rollback.
\item \emph{KISS}. Keep it simple. Complicated systems have more vulnerabilities and it is harder to tell what is going on.
\item \emph{Ease of use is important}. Don't have extremely long, convoluted process.
\item \emph{Separation of privilege}. For example: Two people need to sign off on a transaction.
\item \emph{Least privilege}. Don't give access to more than what someone needs.
\item \emph{Defense in depth}.
\item \emph{Complete mediation}.
\item \emph{Education}. Make sure people understand what is happening.
\item \emph{Transparency}. People should talk about exactly what they're doing.
\end{itemize}
\section{Growth of Cryptography}
\subsection{Early Cryptography}
Greeks: knew there were infinitely many primes, and also the GCD is easily computed by Euclid's algorithm. The Scytale wrapped paper around a rod of the same diameter for sender and receiver. If the attacker didn't have a rod of the correct diameter, he couldn't read it.
Fermat: Fermat's little theorem $a^{p-1} \equiv 1 \pmod p$ for any prime $p$ and $1 \leq a < p$.
Euler: Generalize FLT to Euler's Theorem: if $gcd(a,n) = 1$ then $a^{\phi(n)} \equiv 1 \pmod n$. Where $\phi(n)$ is the Euler totient function.
\subsection{World War I}
The radio was a marvelous new communication technology -- enabled instantaneous communication with remote ships and forces, but you also absolutely need to have encrpytion. Decipherment of the Zimmerman telegram from Germany to Mexico got America involved in World War I.
\subsection{Alan Turing}
Developed the theory of computability, but also worked a lot on cryptography. He and others deciphed the Axis Enigma machines, which had great impact on the war. Cryptanalytic effort involved development and use of early computers.
\subsection{Claude Shannon}
First post-war analysis of security systems.
\subsection{DES - U.S. Data Encryption Standard}
IBM designed DES, Horst Feistel was a key architect of the standard. NSA helped in return for keeping the size at 56 bits.
Scheme was in use for a long time (but it was also controversial).
\subsection{Computational Complexity}
People started asking question of how much time does it take to perform a certain operation. Key notions were polynomial time reductions, NP-completeness. For things that we know are computable in principle may still be infeasible because of the amount of time it takes to run.
\section{Public Key Cryptography}
Ralph Merkle, Marty Hellman, and Whit Diffie invented the notion of public-key cryptography. In 1976, Diffie and Hellman proclaim ``We are at the brink of a revolution in cryptography.''
Each party $A$ has a public key $PK_A$ others can use to encrypt messages to A: $C = PK_A(M)$. Each party $A$ also has a secret key $SK_A$ for decrypting a received ciphertext $C$: $M = SK_A(C)$.
It is easy to compute matching public/secret key pairs but publishing $PK_A$ should not compromise $SK_A$. Idea: sign with $SK_A$ and verify signature with $PK_A$. $A$ produces signature $\sigma$ for message $M$ with $\sigma = SK_A(M)$. Given $PK_A, M,$ and $\sigma$, anyone can verify validity of signature $\sigma$ by checking $M = PK_A(\sigma)$.
However, Diffie and Hellman didn't know how to implement these ideas.
\subsection{RSA (Rivest, Shamir, Adleman 1977)}
Security relies on the inability to factor product $n$ and two large primes $p, q$. So we set $PK = (n,e)$ where $n = pq$ and $gcd(e, \phi(n)) = 1$.
$SK = d$ where $de = 1 \mod \phi(n)$.
\subsection{Digital Certificates}
Loren Kohnfelder's proposed notion of digital certificate, a digitally signed message attesting to another party's public key.
\subsection{RC4 Stream Cipher}
Most widely used software stream cipher. Not public-key, it exclusive-ors stream of pseudo-random bytes with plaintet to derive ciphertext. Extremely simple and fast, uses array $S[0..255]$ to keep a permutation of 0..255, initialized using secret key and two points into S. Created by Rivest.
\subsection{MD5 Hash}
Proposed as a pseudo random function mapping files to fingerprints. Collision resistance was design goal, it should be infeasible to find two files with the same fingerprint. Model for many later hash function designs.
\subsection{World Wide Web}
Just as radio did, this new communication medium drove demand for cryptography to new heights. Cemented transition of cryptography from primarily military to primarily commercial.
\newpage
\Large{Lecture 3}
\maketitle
\section{Growth of Cryptography - Continued}
\subsection{Zero Knowledge Proofs}
I can convince you that I know a solution to a hard problem while telling you nothing about my solution. I want to make sure you don't know anything about my solution.
Important for say, logging into a computer where you want to convince the computer that you know the secret key, but don't want to give the computer any information about the secret key.
\subsection{Micro Payments}
Micali and Rivest (2001). If you want to pay small amounts of money, how do you create a system for this?
Paying ten cents is equivalent to paying \$1 with probability 1/10. Uses pseudorandom digital signatures for verifiable fair dice. Neither the buyer or seller arranges the coin flips, there must be a cryptographically fair way of flipping the coin.
\subsection{Voting Systems}
There are new end-to-end cryptographic voting systems. All ballots are posted on the web and are encrypted, and the voters verify their votes are correct. Anyone can see the final tally.
Uses cryptography for increasing transparency and verifiability. You don't want people selling their own vote.
\subsection{Fully Homomorphic Encryption}
Can you compute on encrypted data, while keeping it encrypted? Given two encrypted data, are there ways to perform operations on them to create a new encrypted answer?
Craig Gentry in 2009 showed that you can do arbitrary computations on encrypted data based on the use of latticecs. Potential applications for cloud computing.
\section{Encryption and One Time Pads}
How do we get messages from Alice to Bob in a way that Eve can't listen in. There's some communication channel that Eve can listen to. Give Bob some key $K$, and Eve does not know $K$.
Alice encrypts message $m$ and sends over $c \leftarrow enc(m)$ the encrpyted message over the channel.
Algorithms:
\begin{itemize}
\item Key generation $K \leftarrow gen(\mathbb{1}^\lambda)$ where $\lambda$ is key length. Here $\leftarrow$ denotes a randomized computation and $\mathbb{1}$ is a single bit.
\item Encryption $c \leftarrow enc(K, m)$, where $c$ is the ciphertext that is sent over the channel.
\item Decryption $m = dec(K, c)$. Note that decryption is deterministic.
\end{itemize}
\subsection{Notion of Encryption}
Ideas of what encryption is:
\begin{itemize}
\item Eve should not be able to produce $m$ from $c$.
\item Can't identify any information about message and authenticate that the message is complete.
\item Shouldn't see patterns in messages.
\end{itemize}
The encryption game: Eve can't distinguish $enc(K,m_1)$ from $enc(K, m_2)$ where $m_1 \neq m_2$ and $|m_1| = |m_2|$ if $K$ is chosen randomly and Eve doesn't know $K$. Eve chooses $m_1, m_2$. Alice chooses one message $m_b$ and encrypts that $c \leftarrow enc(k, m_b)$. Eve tries to guess $b$ given ciphertext $c$.
\subsection{One Time Pad (Vernam 1917)}
Message $m$, ciphertext $c$, key $k$ are all $\lambda$ bit quantities. The key $k$ is also called the pad.
Generation: Flip coin $\lambda$ times to get a pad $k$.
Encryption: Bitwise exclusive-or, $c = m \oplus k$.
Decryption: Exlusive-or, $m = c \oplus k$. This works because $m = m \oplus k \oplus k$.
\subsection{Proof of Security}
Thereom: One Time Pad is unconditionally/information-theoretically secure (Eve may have infinite computing power).
Proof: Let $P(m)$ be Eve's a priori probability of message $m$, or Eve's state of knowledge. Need to show $P(m | c) = P(m)$.
We know that $|m| = |c| = |k| = \lambda$. We are assuming that $k$ is chosen randomly and the probability of any particular key is $P(k) = 2^{-\lambda}$. We know that $P(c|m) = 2^{-\lambda}$ = probability of $c$ given the message = probability that $k = m \oplus c$ = $2^{-\lambda}$.
\begin{eqnarray}
P(c) = \sum_{m} P(c|m) P(m) = \sum_{m} 2^{-\lambda)} P(m) = 2^{-\lambda}
\end{eqnarray}
Where the last follows because $P(m) = 0$ unless $m$ is the actual message sent, in which case $P(m) = 1$.
Now let us look at $P(m|c)$ which is the probability that Eve thinks message is $m$ having seen $c$.
\begin{eqnarray}
P(m|c) &=& \frac{P(c | m) P(m)}{P(c)} \\
&=& \frac{2^{-\lambda} P(m)}{2^{-\lambda}} \\
&=& P(m).
\end{eqnarray}
Thus, the ciphertext does not give Eve any information.
\newpage
\Large{Lecture 4}
\maketitle
\section{One Time Pad}
Surprising that it's not used more often. However, you need to have large secrets. Need to share them securely and keep them secret. These properties give some impact on usability. Most cryptography has small secrets that should be passed.
Another thing: you can't reuse the one time pad.
\begin{eqnarray}
C_1 \oplus C_2 &=& (M_1 \oplus P) \oplus (M_2 \oplus P) \\
&=& M_1 \oplus M_2 \oplus (P \oplus P) \\
&=& M_1 \oplus M_2
\end{eqnarray}
Famous attack from the NSA called Venona on the Russians.
Note that the One Time Pad (OTP) is malleable, which means it doesn't provide any message authentication. If the adversary attempts to change the bits over the channel, then the receiver of the message can't tell. The OTP provides superb confidentiality, but provides zero authentication. So we need to layer on top of it more techniques to achieve both of these.
Common mistake to think that just because something is encrypted, it can get to the receiver all hunky dorey.
\section{Generating Randomness}
\begin{itemize}
\item Flipping coins
\item Environmental noise (microphone, keyboard, camera)
\item Radioactive decay (bits in memory changing)
\item Hard disk drive speed variations
\item Intel new instructions (using the metastable state)
\item Quantum randomness.
\end{itemize}
\section{Cryptographic Hash Functions}
\emph{Cryptographic Hash Function} maps strings of arbitrary finite length to strings of fixed length ($d$ bits). $h: \{0, 1 \}^* \rightarrow \{0,1\}^d$. We sometimes call $h(x)$ the hash of $x$ or the digest of $x$.
These functinos should be efficiently computable, but it shouldn't sacrifice efficiency for security. There is no secret key, it should be a completely public function.
\subsection{Examples}
Rivest's functions: MD4, MD5.
NIST standards: SHA-1, SHA-2: (SHA-256, SHA-512), SHA-3 (Keccak).
\subsection{Random Oracle Model (ROM)}
Specification of what the ideal cryptographic hash function would be. The user sends the oracle $x$, and the oracle sends the user back $h(x)$. If a second user sends the oracle $x$, the oracle will send back the same $h(x)$.
Oracle's Process: Receives $x$. If $x$ is in the book, look up $h(x)$ and return it. Otherwise, flip a coin $d$ times and call that $h(x)$. Write this $h(x)$ in the book and return it.
This is both consistent and random (the ideal of a cryptographic hash function). In the random oracle model, if $x \neq y$:
\begin{eqnarray}
P[h(x) = h(y)] = \frac{1}{2^d}
\end{eqnarray}
\subsection{Properties}
\begin{itemize}
\item One-wayness (OW) - preimage resistance. If $h(x) = y$ then $x$ is the preimage of $y$ and $y$ is the image of $x$. It should be hard to go from $y$ back to $x$.
Infeasible for anyone given $y \in_{r} \{0, 1\}^d$ (where $\in_r$ denotes randomly chosen) to find any $x$ such that $h(x) = y$. Infeasible means that work is proportional to $2^d$, which is just brute forcing every possible $x$ and checking if it matches a $y$. Maybe take $d \geq 90$ to make this hard.
\item Collision resistance (strong collision resistance).
Infeasible for anyone to find $x$ and $x'$ such that $x \neq x'$ and $h(x) = h(x')$.
In Random Orcale Model, difficulty is $\theta(2^{d/2})$ for finding any collision. The work should eceed $2^90$ if $d > 180$. You lose a factor of 2 because of the birthday paradox.
\begin{eqnarray}
x_1 &\to& y_1 \\
x_2 &\to& y_2 \\
&\vdots& \\
x_n &\to& y_n
\end{eqnarray}
\begin{eqnarray}
E[\textrm{collisions}] &=& \sum_{i \neq j} Pr[h(x_i) = h(x_j)] \\
&=& \sum_{i \neq j} \frac{1}{2^d} \\
&=& { {n \choose 2 } } 2^{-d} \\
&=& \frac{n (n-1)}{2} \frac{1}{2^{d}}
\end{eqnarray}
This is roughly $n^2 2^{-d}$, which means that if $n > 2^{-d/2}$, the expected number of collisions is greater than 1.
\item Weak collision resistance (target collision resistance) (WCR).
Infeasible given $x \in_r \{0, 1\}^*$ to find $x' \neq x$ such that $h(x) = h(x')$. Like pairwise resistance, work is $\theta(2^d)$ in ROM.
\item Pseudorandomness. Indistinguishable from a random oracle. Hard to define well.
\item Non-malleability.
Infeasible given $h(x)$ to produce $h(x')$ where $x$ and $x'$ are related.
\end{itemize}
\subsection{Applications}
\begin{itemize}
\item Password storage: store $h(p)$ instead of string $p$. System compares $h(p)$ to $h(t)$ where $t$ is the typed in password attempt. For a given user, this depends on the one-wayness property.
\item
\end{itemize}
\newpage
\Large{Lecture 5}
\maketitle
\section{Hash Function Applications}
\subsection{Password Storage}
System stores $h(pw)$ rather than $pw$ itself. System might also store username, salt, etc.
\subsection{File Modification Detector}
You want to monitor to detect when files have been changed. For each file, store $h(F)$ securely. You can check to see if the files have been modified by recomputing the hash. Provides detection (not prevention).
\subsection{Digital Signatures (hash and sign)}
\begin{itemize}
\item $PK_A$ is Alice's public key (for signature verification).
\item $SK_A$ is Alice's secret key (for signing).
\item Signing: $\sigma = sign(SK_A, m)$ and $\sigma$ is Alice's signature on message.
\item Verification: $verify(M, \sigma, PK_A) \in \{true, false\}$.
\end{itemize}
Idea: computing $h(m)$ is fast, so sign $h(m)$ instead of signing $m$. We do $sign(m, SK_A) = sign(m', SK_A)$ if $h(m) = h(m')$.
Problem is that if $h(m) = h(m')$, then asking Alice to sign $m$, her signature $\sigma$ is also a signature for $m'$.
\subsection{Commitments}
Alice has value $x$ which is her bid. She computes $C(x)$ and gives auctioneer $C(x)$, which is her sealed bid. When bidding is over, Alice should be able to open $C(x)$ to reveal $x$.
Want these properties:
\begin{itemize}
\item Binding: Alice should not be able to open $C(x)$ in more than one way.
\item Secrecy: Anyone seeing $C(x)$ should have no information about $x$.
\item Non-malleability: Anyone seeing $C(x)$ shouldn't be able to come up with a related bid, e.g. $C(x+1)$.
\end{itemize}
Let's try $C(x) = h(\textrm{username} || x || r)$ where $r$ is a random value which is secret to Alice. To make sure that the hash function satisfies all of the above properties, we need to have collision resistance (for binding), one-wayness (for secrecy, but we need a little more, we don't want to leak at any information on $x$), and non-malleability.
To open the bid, everyone sends in their bids once the auction is over, and you check to make sure the messages hash to their commitment value.
\subsection{Merkle Tree}
Authenticate a collection of objects $x_1, x_2, \ldots, x_n$. You go up the tree, computing hash values of two children below it. The hash value at the top of the tree is the root value. To check if some $x$ is a member of the collection, then you must get $x$'s brother, and you can compute up the tree if you are also given the other values of the nodes.
We have to have target collision resistance (to make sure you can't find another value that leads to the root hash), also collision resistance because maybe the guy making the tree made another tree as well.
\section{Merkle-Damgard Construction}
Start off the machine with a state of all zeros. Then you concatenate $m_1$ with the current state and hash this. This results in a new state $c_1$, and you concatenate $m_2$ with it and hash to result in $c_2$. Keep doing this for some number of iterations until you get $h(m)$ which is some partition of $c_n$.
Common design pattern for function $f$ is $f(c_{i-1}, m) = C_{i-1} \oplus E(m_i, c_{i-1})$ where $m_i$ is the key and $c_{i-1}$ is the message.
\section{Keccak}
This is an iterative algorithm and there are two components to the state. The width is composed of $r + c$. You first take $r \oplus m_1$ and send that as well as $c$ copied over to $f$, where $f$ is a permutation of $\{0,1\}^{c + r} \rightarrow \{0,1\}^{c + r}$. It's a random looking object, and it's also public. One we've finished with all the message parts, then $h(m)$ is just $r$ from the output state.
The compression is happening at the xor.
\newpage
\Large{Lecture 6}
\maketitle
\section{The Web}
The web works with http requests and responses.
\subsection{HTTP Request}
You have a method, path, and http version. The url specifies protocol, domain, port, etc using the DNS lookup system.
\subsection{HTTP Response}
Contains status code with reason text (200 is OK, 404 is not found, etc). You also have headers which contain server information, as well as possibly a cookie in the header. Then after the headers, there is the data content.
However, note that http server is stateless. Server and client don't know about each other (supposedly). You can maintain state using the cookies sent in the header of the HTTP response or a database on the server.
Cookies are files stored on the client. The database stores information about the users.
\subsection{Data Content}
The data content is the html file and references, where references are things like css, javascript, or flash plugins. The data content is structure in html in a tree structure.
\section{Web security}
\subsection{Authentication}
The goal of web security: safely browse the web. Users should be able to visit a variety of websites without incurring harm. Hackers shouldn't be able to steal, change, or read user's information.
Server authenticates and checks to make sure that a user $U$ is indeed $U$. The most common method is passwords.
\subsection{Passwords}
The goal is to make guessing passwords the attacker's best strategy. User has a password and a username. In naive implementation the server has a database of usernames and passwords. The user sends the username and password. The server needs to check that these match. But stealing the database means all the username and passwords are given up.
Round 2: try storing the hash of the password. If the attacker steals the hashed password, you want one-wayness for the hash.
\subsection{Dictionary Attacks}
The attacker chooses a dictionary and picks password from the dictionary. Online attacks: the attacker doesn't have any way of figuring out if the password is correct other than sending a request to the server. To stop this, you can just rate limit. Offline attack, the attacker has the hash, so finding $h(pw)$ requires $O(|dict|)$.
But, you can try a batch offline dictionary attack. For every word in the dictionary, you computer the hash of the word. So now you have a list $L: \{w, h(w) \}$. Then you intersect this list with the database of passwords. Requires $O(|dict| + |T|)$ where $T$ is the size of the intersection.
We can try to prevent these types of attacks by salting. Now, the server stores a username, password, and a random salt. The server then computes hash of the salt and the password. Therefore, the batch offline dictionary attack needs to compute passwords for every salt, so best attacker strategy is to just go through all possible passwords in the dictionary for each hash $O(|Dict||T|)$.
\subsection{Generating Multiple Client Passwords}
Problem with passwords is that people reuse passwords for each site. What we want is to create different passwords for different sites with some client software that can be accessed with a single password $pw$. We can hash $h(pw, server\_id)$ for each server. The hash needs to be one-way (given the hash don't want to be able to figure out the password) and non-malleability (given a hash of one website, I don't want to be able to figure out the hash of another website).
\subsection{Cookies}
Cookies are file stored by the server at the client. They help to maintain state, but they're also useful for authentication. Avoids sending the password over the network many times. Don't want to have to type in a password every time. You also don't want to send a password every single time you log in. However, if the adversary gets access to the cookie, then he has complete access to your account.
Cookies contain:
\begin{itemize}
\item name
\item value (like user id, number of visits)
\item domain (mit.edu)
\item path (/courses/2013)
\item expiration
\end{itemize}
Each browser contains a ``cookie jar,'' these cookies are obtained in the following manner: user logs in with username and password. Server sends back an http response where the header contains a cookie. The next http request by the user uses the cookie that it obtained from the last time. The server now sends back an http response with an updated cookie.
These are dangerous, however, because of an adversary could just fake a cookie. We want the browser to not be able to come up with a fake cookie. To do this, we want a hash of username, expiration date, and a secret key from the server.
The server can now check that the cookie value is equal to the hash of username, expiration date, and secret key, this is how the server can check to make sure the cookies haven't been faked.
The hash needs to have one-wayness, non-malleability, but also unforgeability. Would suffice if hashes were random oracles.
\section{Attacks on Web Applications}
\subsection{SQL Injection}
Attacker sends malicious input to the server, but the server has bad input checking which leads to an actual sql query. For example, user sends the username and password. If the inputs come directly from the user without checking, you could send something like ``or 1=1 --'' which always evaluates to true then comments out the rest of the query. The authentication will always return true.
CardSystems had a SQL injection which allowed attackers to steal 263,000 credit card numbers. To fix these types of attacks, you need to sanitize inputs, make sure SQL arguments are properly escaped.
\subsection{CSRF: Cross Site Request Forgery}
Bad website sends a request to a good web site pretending to be the browser of an innoCent user, using the credentials of an innocent victim. Let's say the user has already authenticated with a good website. Then the user goes to a malicious webpage, and the malicious webpage returns something which asks the browser to use its cookie to authenticate with the good website.
Countermeasure: Good server needs to distinguish between good user and attacker. The user actually fetches a page, fills in the form for the request, and sends the request. The attacker never fetches a page, sends the request directly. Include some random token in the fetched page. When good user fetches a page, server embeds a random token in the forms and server stores it in the database.
When user sends the form, the token is sent to server along with the user cookie. The server checks to see that the token is correct. The attacker never knows the token because he never went to the page, he just tried to post a request using a url.
\subsection{XSS: Cross Site Scripting}
Attackers send data with script to server. Server stores it thinking it is data and serves it to other users. Now, when the website is visited by other users, the website pulls out the information from the database. However, when the browser renders the page, the browser executes the script. The script can steal all user cookies or other credentials to someone else or change the rest of the webpage to ask for credit card number.
Difficult to prevent, must employ a set of fixes. One way to fix is to escape special characters in any user-provided data before sending it to others.
\newpage
\Large{Lecture 7}
\maketitle
\section{Buffer Overflow Overview}
\emph{Buffer overflows} are a code injection attack. Buffer overflows occur at a lower abstraction level than say, a SQL injection. They result in overwriting existing data in the memory. They exploit the layout of programs in memory. Also take advantage of how C deals with arrays.
\subsection{Contents of Memory}
The contents of memory are as follows (from top to bottom of the memory stack):
\begin{enumerate}
\item Program code
\item Data segments for the global variables
\item Heap. Grows downward. Dynamically allocated memory (e.g. malloc)
\item Stack. Grows upward. Stores program execution context (e.g. function calls bool variables)
\end{enumerate}
\subsection{Stack Frames}
Here's an example C program:
\begin{verbatim}
void test(int a, int b) {
int flag;
char buffer[10];
}
\end{verbatim}
On the stack, you have your return address, variables a and b, as well as a stack frame pointer. Then, the function dynamically allocates some memory using malloc and so you can make space for flag and buffer on the stack.
C string ``hello world'' is just an array of bytes that ends with a null byte. C functions like strcpy copies bytes from the string until it hits a null byte. If we strcpy ``hello world'' into the buffer, it will go and fill up the entire buffer, but it will still have more bytes so it will keep going and overwrite the first two flags of byte.
If you have too much user input that overwrites buffer, then you get a segmentation fault because you're trying to access memory locations which are invalid (because you've probably overwritten the return address to something that's invalid).
\subsection{Consequences of Buffer Overflows}
Buffer overflows can:
\begin{itemize}
\item Crash programs
\item Overwrite variables with new values
\item Execute arbitrary code. You could change the return address to a new valid value. If you get the return address correct to point to a spot where you have put in arbitrary code, then that code will just start executing. One common location is the buffer itself.
\end{itemize}
\subsection{Shell Code}
One of the most useful pieces of code to execute would be to open up a command shell. Shell code is code that opens up a command shell (\emph{exec()}). However, its not so simple because you need to insert code which avoids null bytes (or else strcpy will stop copying to the buffer).
For x86, you can get shell code that is 31 bytes and which is composed of ASCII characters.
To get the attack to work, you can place a bunch of the same return addresses to the end of the buffer, and a bunch of NOPs (non-operation) to the beginning of the buffer before the shell code. Even if you don't know the exact location in memory, you can still get the program to go to the top to where the NOPs are placed, then run into the shell code. You don't need to know the exact location in memory.
\subsection{Return to libc Attack}
Pretty much every stack will have libc, which includes \emph{printf()}, \emph{exec()}. Set up stack to look like function calls to functions in libc.
\section{Buffer Overflow Prevention}
\subsection{Canary Values}
Insert a random value on the stack between the local variables and the control information. If that random value is changed to something else, then we probably have a buffer overflow. The value needs to be random so that the attacker can't predict it and just write it in on the buffer overflow.
You must check the canary value before returning. Typically, the value is stored in a register so that it will always be there. If it matches, then continue and return. However, if the value has been changed, you should just crash the program. Many major compilers use the canary idea.
\subsection{Safe Functions}
Have functions which make sure that everything you are handling is safe. For example \emph{strncpy} which takes in a length as a parameter and copies until it hits a null byte or it reaches the length passed in.
If you only use safe functions, then you don't have to worry about buffer overflows. However, using safe functions require that you always use them and always use a valid, correct length.
\subsection{Non-executable Stack}
You should never be executing code in the stack. Execution context should never be in the stack, only at the text part of the program. This is often done in hardware, called NX. Also there are software emulations (ExecShield, W\^X). These programs make sure that you don't have the ability to execute programs when you're in the stack, and you only have execution inside of the actual program code text.
\subsection{Address Space Layout Randomizaion (ASLR)}
Make the guessing of the address space very difficult. The text, stack, global varaibles, and heap all live at a random point in address space. If you have a large address space, it probably doesn't matter that these are in a random place. Makes it more difficult for the attacker to supply a correct return address.
\newpage
\Large{Lecture 8}
\maketitle
\section{Overview of Block Ciphers}
Take in some plaintext $p$ and a key $k$, then encrypt the plaintext to obtain ciphertext $c$. Each of $p$, $c$, $k$ depend on the standard.
\begin{itemize}
\item Data Encryption Standard (DES): $|p| = |c| = 64$ bits, $|k| = 56$ bits.
\item Advanced Encryption Standard (AES): $|p| = |c| = 128$ bits, $|k| = 128, 192, 256$ bits.
\end{itemize}
The above is the framework for a block cipher, which is usually the building block of more advanced crpytographic techniques.
\section{Data Encryption Standard (DES)}
Feistel Cipher: created a structure which you can undo. Feistel proposed an input block divided into two parts $L_0$ and $R_0$. Based on multiple rounds of encryption. In the first round, you take $R_0$ and call a function on $f(R_0, K_0)$, then xor the result with $L_0$. Once you are done, the new result becomes $R_1$ and the old $R_0$ becomes $L_1$. That is a single round of the algorithm, do this 16 times.
Notice that this is an invertible operation, even though $f$ doesn't need to be invertible.
\section{Types of Attacks}
\subsection{Differential Analysis}
Invented in the public domain by Shamir and Biham. Lets take a message $M$ and run it through the encryption box, and it returns $c$. Now take $M \oplus \Delta$ where $\Delta$ is a small change in the message. This causes a result $c + \delta$. What happens when you change some bits of the message. You can track the changes of the bits down the entire structure.
It improves the number of things you need to run. Instead of $2^{56}$ brute force attacks, a differential attack could lead to only $2^{47}$ chosen messages. This is strictly better than what you could do if you didn't have the differential attack.
\subsection{Linear Attacks (Matsui)}
The idea is that maybe $f$ is non-linear but can be approximated by something linear. Maybe most of the time it is linear. For example, suppose that the equation below was satisfied with probability $1/2 + \epsilon$:
\begin{eqnarray}
M_3 \oplus M_{15} \oplus C_{2} \oplus K_{14} = 0
\end{eqnarray}
Run a bunch of message pairs, get the cipher text and solve for the key bit. Since this is true with probability $1/2 + \epsilon$, you can just take the average and it will give you the correct answer using the law of large numbers.
This actually works even better than differential attacks, only $2^{43}$ pairs needed.
\section{Advanced Encryption Standard}
Rijndael was the winner. It has a fixed length input and output block of 128 bits. They key size is either 128, 192, or 256 bit keys and you can have 10, 12, or 14 rounds.
Byte-oriented design. Can think of these as an element in $GF(2^8)$. You can think of 128 bits as a 4 by 4 array of bytes. Derive round keys which are each 128 bits (the keys are different for each round). Now each round has 4 steps:
\begin{enumerate}
\item XOR the round-key into the message to derive the 4 by 4 table.
\item Substitute bytes from a lookup table. Takes an 8 bit input and gives an 8 bit output. This is the only nonlinear operation in the standard.
\item Rotate rows, each by different amounts. If bytes were in the same column, they are rotated so they are no longer in the same column.
\item Mix each column to become $C \leftarrow A C$ where $A$ is some matrix.
\end{enumerate}
The above steps are run for each round and the final state is outputted from the algorithm.
\section{Ideal Block Cipher}
There are now two inputs, a key and a message. The ciphertext only needs to be invertible for each key. The ideal block cipher: \emph{each key independently has a random permutation of the message space}. Thus, for each key, we have different random permutation of the message space. The information from one key doesn't give you any information about the other key.
\section{Confidentiality}
We want confidentiality, aside from the length of the message. Variable-length inputs from some message space $\mathrm{M} = \{0,1\}^{*}$. How do we make sure we get confientiality?
\subsection{Electronic Code Book (ECB)}
We take our message and divide it into blocks $M_1, M_2, \ldots, M_n$. Get ciphertext for each block using some key to get $C_1, C_2, \ldots, C_n$. The concatenated result is the full ciphertext.
We need to make sure that the message ends up to be exactly $128$ bits to make sure we can use block ciphers. To make sure this is the case, we can just pad our messages to make sure that this is true. We want the padding to be invertible (because we'll have to remove the padding). A typical approach is to always append a 1 and then as many 0s as necessary. If we always append a 1 no matter what, then we're fine no matter what message was inside.
This isn't very good, though, because repeated patterns in the message usually end up as repeated pattern in the cipher text, since the same key is being used the entire way through.
Because of this issue, ECB is pretty much deprecated except for very short messages or random data.
\subsection{CTR Mode}
We start a $128$ bit counter at some bit $i$. We encrypt the counter $i$ with some secret key $K$ and get $x_i$. Then, we XOR this random looking quantity $x_i$ with our first message block $M_1$ to get $C_1$. To get more ciphertext, add one to the counter, encrpyt it to be $x_{i+1}$ and XOR $M_2$ with $x_{i+1}$ to get $C_2$. Repeat until you get to the end.
The set of $x_{i}, x_{i+1}, \ldots, x_{i+n-1}$ is equivalent to the pad. Once you have this, then you send $i, C_1, C_2, \ldots, C_n$. The legitimate decrypter just needs to encrypt $i, i+1, \ldots, i+n-1$ with $K$, which allows you to XOR $x_{i}$ with $C_1$ to get $M_1$.
\newpage
\Large{Lecture 9}
\maketitle
\section{Cipher Block Schemes}
\subsection{Cipher Block Chaining (CBC)}
There exists some initialization vector (IV) which is chosen at random by the sender. This is a starting point, but it's not a secret key. We do $IV \oplus M_1$ then encrypt the result to get ciphertext $C_1$. For the next block of message, we could XOR with IV like before, but that's not as good because we could give away the encryption. Therefore, we use the previous ciphertext and get $C_1 \oplus M_2$ and encrypt that to get $C_2$.
The basic pattern is then $C_{i-1} \oplus M_i$, which is encrypted to obtain $C_i$. Think of $C_{0} = IV$. Thus, even if you change one bit, it changes the entire pattern down the line.
Ciphertext stealing is a cute trick for dealing with messages that are not exactly 128 bits in length. We transmit $IV, C_1, C_2, \ldots, C_n$. Decryption steps can be processed by decrypting $C_1$ and XORing it with IV to obtain $M_1$. Continue onwards until you have all $M_{i}$.
Interesting thing about CBC is that a single bit change will ripple down through the chain.
\subsection{Cipher Feedback Mode (CFB)}
You have an initialization vector (IV) and a secret key $K$. You encrypt the IV using the secrety key and XOR the resulting value with $M_1$. This results in $C_1$. Then, encrypt $C_1$ and XOR the resulting value with $M_2$ to get $C_2$. Continue XORing the encrypted block of $C_{i-1}$ with $M_i$ to get $C_i$.
To decrypt we start off encrypting IV and XOR with ciphertext to get the message. Continue onwards in this manner.
Very similar to Cipher Block Chaining, and has many of the same properties like single bit rippling.
\section{Ciphertext Indistinguishability}
To prove that our cipher block schemes are secure, we need to define what it means to be secure. Let's define a game with the adversary. Our mode is secure if the adversary can't win the IND-CCA (Indistinguishability under Chosen Ciphertext Attack) game with probability significantly more than half the time.
Let's set up the game. The adversary is going to be asked, what is the message behind the ciphertext? The adversary will supply the plaintext messages to be encrypted, and he needs to guess which one is being encrypted. The key is chosen ahead of time in a random fashion.
Phase 1: ("Find"). Adversary makes up two messages $m_0 \neq m_1$ such that $|m_0| = |m_1|$. Adversary is given access to encryption oracle $E_k(.)$ and decryption oracle $D_k(.)$.
Phase 2: ("Guess"). Examiner picks $d \leftarrow \{0, 1\}$ is a random bit (either 0 or 1). Then examiner encrypts message $d$ and obtains $y = E_k (m_d)$. Adversary is given $y$ and any state information $s$. Remember, adversary has access to $E_k(.), D_k(.)$ except on $y$. The adversary then gives $\hat{d}$ which is the adversary's guess for $d$.
The adversary's advantage is $|Pr[\hat{d} = d] - \frac{1}{2} |$. The scheme is secure against IND-CCA if adversary's advantage is negligible (should be done in probabilitistics polynomial time).
To be secure in IND-CCA, encryption must be randomized, otherwise the adversary could just pick two messages beforehand and compute the ciphertext. Notice that under IND-CCA, the CBC and CFB modes are not secure. You could just send the decrypter the first half of the cipher text, and you will get the first half of the message.
This is the strongest definition of security that we know about.
\section{Unbalanced Fiestel Encryption Mode (UFE)}
Let's say that we have a message $M$ and we're going to use CTR mode for some key $k_1$. We use counter on $r$ and XOR result with $M$. This results in ciphertex which is then sent to CBC MAC with IV = 0 and a new secret key $k_2$.
We have $k = (k_1, k_2, k_2)$ and $r \leftarrow \{0, 1\}^b$ randomly chosen. The pad is $P = P_1, P_2, \ldots, P_n$ where $P_1 = E_{k_1} ( r + i)$. The resulting ciphertext is $C = c_1, c_2, \ldots, c_n$ where $c_i = m_i \oplus P_i$. Then in CBC-MAC we have $x_0 = 0^b$ and set $x_1 = E_{k_2} (x_{i-1} \oplus c_i)$ and $x_n = E_{k_3} (x_{n-1} \oplus c_n)$.
Result is $\sigma = r \oplus X_n$ and $c_1, c_2, \ldots, c_n$.
Idea is that you send $r$ implicitly through $\sigma$. The receiver can obtain $r$ by sending $C$ through the CBC-MAC and obtain $X_n$, so then you can XOR $X_n$ with $\sigma$ to obtain $r$. This allows you go to go ahead and compute $P$, which gives you the message by doing $P \oplus C$.
\section{Message Authentication Codes (MAC)}
Alice wants to send some message to Bob, and Bob wants to be able to know if Eve has intercepted the message and changed the message. Alice sends $M$ and $MAC_{k}(M)$ which is a message authentication code. Bob can check if $MAC_{k}(M)$ is correct by recomputing it. It can't be recomputed by Eve because she doesn't know the secret key $k$. If Bob doesn't get the correct message authentication code, then he knows Eve has changed the message.
If $MAC_{k}(M)$ is correct, then Bob proceeds, otherwise Bob rejects. Remember that Eve can replay old messages and $MAC_{k}(M)$.
\subsection{MAC Game}
Alice and Bob share $K$. Eve wins if Bob accepts ($M', MAC_{k}(M')$) where $M'$ is different from anything that Eve has heard on the line. We give Eve oracle access to the $MAC_{k}$ function, but these queries don't count as wins for Eve.
Eve wants to forge a new message authentication code. A good scheme should not allow an adversary to create valid MACs which have not already been seen.
\newpage
\Large{Lecture 10}
\maketitle
\section{Authentication}
Eve should not be able to fool Bob that Alice didn't actually create. This is the core of authentication. Eve can forge with probability $2^{-|T|}$ where $T$ is the length of the MAC. Eve wins the game if she can forge with higher probability than that.o
\subsection{CBC-MAC}
Start with message $M_1$ and a vector of zeros. Take $0 \oplus M_1$ and encrypt the result. The result of this encryption is sent to be XORed with $M_2$. Continue encrypting the result and sending this encrypted result to XOR with $M_i$ until you get to the last block. The encryption in the first $n-1$ blocks all use the same key $K$. The last block, however, uses a different key $K'$. The resulting output of the last block is the MAC that you send over.
\subsection{PRF-MAC Hash Function}
Use the following MAC: $MAC_{k}(M) = H(k || m)$ and truncate to $\tau$ bits. This is actually defective in practice because the hash functions are computed in a sequential manner so you might get a length extension attack. Sometimes people can get $m'$ which is just an extension of $m$ so that the entire $m'$ isn't used in computing the hash, so that the MAC is defective.
Isn't necessarily used very often in practice because of this. Better idea is:
\begin{eqnarray}
HMAC_{k}(M) = H(K_1 || H(K_0 || M))
\end{eqnarray}
Where $K_0 \leftarrow K \oplus (pad)_0$ and $K_1 \leftarrow K \oplus (pad)_1$. HMAC is used much more frequently in practice.
\subsection{Combining MAC and Encryption}
People really want something that does encryption and mac at the same time. We encrypt then MAC.
\begin{eqnarray}
ENC_{k_1}(M) \rightarrow C
MAC_{k_1}(C) \rightarrow T
\end{eqnarray}
Then, you send $\{C, T\}$. The receiver can check $T$ and see if you want to decrypt the message. Make sure the ordering is that you encrypt, compute MAC, then on the otherside decrypt the MAC, and decrypt. You can get into difficulties if you don't do it the right way.
\section{EAX Mode: Authenticated Encryption}
Bellare, Rogaway, Wagner created EAX mode. \emph{Nonce} is a value only used once, never repeats, could be a counter. $E_{k}(N)$ looks random. Associated data should be authenticated but doesn't need to be encrypted. We will call this the header $H$. Message $M$ to be encrypted.
EAX uses one key and is online -- no need to know length ahead of time.
\subsection{Workings of EAX}
Takes as input variable $\tau$, variable length nonce, arbitrary block cipher. Take the message $M$, and send it through counter mode with key $K$, results in ciphertext $C$. The starting value of the counter mode comes from the nonce $N$. The MAC of the nonce creates the starting counter value. The ciphertext is sent through a MAC of key $K$, as is the header, and the MACs of the nonce, message, and header are all XORed together and the result is $T$.
Use $MAC_{k}^i (m) \approx CBC-MAC(i || m)$ as the MAC, using $i=0$ for the nonce and $i =2$ for the message and header.
The following is transmitted: $N, C, H, T$.
\section{Finite Fields}
Set $S$ and two binary operations $\times, +$.
\begin{itemize}
\item $S$ is finite containing 0 and 1.
\item ($S$, +) is an abelian (commutative) group.
\begin{itemize}
\item Identity 0 : $a + 0 = a$
\item Associativity: $((a+b) + c ) = (a + (b+ c))$
\item Inverses: $\forall a$ $\exists b$ s.t. $a + b = 0$
\end{itemize}
\item ($S^*$, $\times)$ is an abelian group with identity 1. $a \times 1 = 1 \times a = a$, $a \times b = b \times a$. $\forall a \in S^*$ $\exists b \in S^{*}$ s.t. $ a \times b = 1$.
\end{itemize}
Cryptography likes finite sets. Take $Z_p = GF(p)$, how do we solve $0 = ax + b \mod p$? We just do $a = a^{-1} (-b) \mod p$.
$GF(q)$ exists for $q = p^k$ for any $k \geq 1$. For example $GF(2^8)$. Polynomials in $x$ with degree less than $8$. Coefficient in $GF(2)$ is $1 + x + x^5 + x^7$ mod out by some fixed polynomial $f(x)$.
Repeated squaring to compute $a^b$ where $a$ is in the field and $b \geq 0$ is an integer.
\begin{eqnarray}
a^b = \left\{ \begin{array}{c c}
1 & \text{if} b = 0 \\
(a^{b/2})^2 & \text{if} b > 0, b \pmod{2} = 0 \\
a a^{b-1} & \text{if} b > 0, b \pmod{2} = 1
\end{array}\right\}
\end{eqnarray}
Theorem: In $GF(q)$ for all $a \in GF(q)^{*}$, we have $a^{q-1} = 1$. Thus, $a^{q} = a$ for all $a \in GF(q)$. This means that $a a^{q-2} = 1$. This means that $a^{-1} = a^{q-2}$.
\newpage
\Large{Lecture 11}
\maketitle
\section{Finite Fields}
Finite fields $(S, +, \times)$ where $S$ is a finite set. $GF(q)$ is the finite fields with $q$ elements.
\emph{Theorem:} $GF(q)$ exists if and only if $q = p^k$ for some prime $p$ and integer $k > 0$.
Usually, if $p$ is prime then $GF(p) = Z_p$ so we have addition modulo $p$ and the field is $Z_{p} = \{0, 1, \ldots, p-1\}$. But consider $GF(2^2)$, we can't just take $Z_{4}$ because 2 does not have an inverse. Elements are polynomials in $x$ of degree less than $k$ with coefficients in $GF(p)$. So for $GF(2^2)$ we have: $\{0, 1, x, x+1 \}$. Notice that coefficents are modulo 2. We just need to defined addition and multiplication for these. Addition is easy: take modulo 2 in each position, for example $x + (x+1) = 2x + 1 = 0x + 1 = 1$. Multiplication is a little trickier, but we do something very similar to $GF(p)$. Work modulo a polynomial of degree $k$ which is irreducible. $x^2, x^2 + 1, x^2 + x$ are all reducible, so we should use $x^2 + x + 1$. All multiplication is done modulo this polynomial.
We can build up a multiplication table:
\begin{eqnarray}
\begin{array}{l c | c | c | c}
& 0 & 1 & x & x+1 \\
0 & 0 & 0 & 0 & 0 \\
1 & 0 & 1 & x & x + 1 \\
x & 0 & x & x+1 & 1 \\
x+1 & 0 & x +1 & 1 & x
\end{array}
\end{eqnarray}
What is $x^2 \mod {x^2 + x + 1}$? We have $1$ remainder $-x -1$ but we know that $-x = x \mod{2}$ and we know that $-1 = 1$. So we have $x+1$ as the remainder. So $x^2 = x+1$.
\subsection{Computing Powers}
Given the ability to compute products, we can compute $a^b$ in time $O(\log b)$. One of the reasons this is useful is because we have Euler's Theorem: in $GF(q)$, $forall a \in GF(q)^*$ then $a^{q-1} = 1$. Note that $GF(q)^*$ is all elements in $GF(q)$ which are non-zero. Want to work in $GF(p)$ but we need to find a large prime number.
Finding large prime numbers: turns out to be easy. This is because primes are relatively common.
\subsection{Generate and Test Primes}
Generate a number $p$ of the desired size at random and test it for primality. This is good as long as there are a sufficient number of primes and the test for primality is cheap.
Prime number theorem: Number of primes less than x is equal to $x / \ln(x)$. For us, $x = 2^k$, so we have $\approx 2^k / k$. There are $2^k$ total numbers of $k$ bits, so you have $1/k$ probability of finding a prime. This is pretty good.
\subsection{Testing for Primality}
If $p$ is prime then $\forall a \in Z_{p}^*$ we have $a^{p-1} \equiv 1 \pmod{p}$. Going backwards the other way doesn't always work (Carmichael numbers) but they work with high probability. It almost suffices to test $2^{p-1} = 1 \pmod{p}$ for large $p$. In good implementations, you probably want to check to make sure these aren't Carmichael numbers.
\section{One-time MAC}
Recall the basic setup of Alice sending messages to Bob where both of them have a secret key $K$. Eve tries to listen in and has infinite computational power, but doesn't have the key. She tries to intercept the message and the MAC tag $(M,T)$ and send a changed message $(M', T')$. Eve wins the game if Bob accepts the altered message with probability greater than chance.
$T = MAC_{k}(m) = a M + b \pmod{p}$ where $p$ is a prime and the key is $K=(a,b)$ where $0 \leq a, b \leq p$. The message satisfies $0 \leq M \leq p$.
\emph{Theorem:} Eve can only achieve success with probability $1/p$ for the One-Time MAC.
\emph{Proof:} Given $M'$. For each $T'$ that she could guess, there exists a unique key $K=(a,b)$ such that both $T = aM + b \pmod{p}$ and $T' = aM' + b \pmod{p}$. Both of these conditions must hold in order for Eve to succeed. Need to find $a = (T - T')/(M - M')$, and we can do this because $M \neq M'$. Then we can solve for $b = T - a M$. There is exactly one key that makes this work. Therefore, Eve's chances of making $M', T'$ work is the chance that Alice and Bob have picked the key. The probability of picking that particular key is $1/p$ because Alice and Bob already have the equation $T = aM +b$, so picking $T' = a M' +b$ requires probability $1/p$.
\section{Number Theory}
$d | a$ means that $d$ divides $a$ so that there exists a $k$ such that $dk = a$. We know that $d$ is a divisor of $a$ if $d \geq 0$ and $d | a$. For all $d$, $d | 0$, and for all $a$, $1 | a$.
If $d$ is a divisor of $a$, and $d$ is a divisor of $b$, then $d$ is a common divisor of $a$ and $b$. The greatest common divisor is the largest of the common divisors. We specify $gcd(0,0) = 0$ by definition and $gcd(0, 5) = 5$. We can compute $gcd(24, 30) = 6$, etc.
Definition: two numbers $a$ and $b$ are relatively prime if $gcd(a,b) = 1$.
\subsection{Euclid's Algorithm for GCDs}
For $a,b \geq 0$, we have:
\begin{eqnarray}
gcd(a,b) = \left\{\begin{array}{c c}
a & \text{if } b = 0 \\
gcd(a, a \mod b) & \text{else}
\end{array}\right.
\end{eqnarray}
We see that $a \mod b$ is strictly less than $b$, so we know this will terminate.
One of the most important implications of this is to compute multiplicative inverses when two numbers are relatively prime. Suppose $a \in Z_{p}^*$ and we want to compute $a^{-1} \pmod{p} = a^{p-2} \pmod{p}$. This works when $p$ is prime, but if $p$ is not prime, and $a \in Z_{n}^*$, then we need to find $ax + ny = 1$ so that $ax = 1 \pmod{n}$ and $x = a^{-1} \pmod{n}$.
\newpage
\Large{Lecture 12}
\maketitle
\section{Group Theory}
\subsection{Orders of Elts}
The most common group we see in cryptography is $Z_{p}^* = \{1,2, \ldots, p-1 \}$. This is a pretty group, very simple, and nice to work with. We say that $order_{n}(a)$ is the order of $a$ modulo $n$, where $a \in Z_{n}^*$. The order is the smallest $t > 0$ such that $a^t \equiv 1 \mod n$.
Fermat's Little Theorem: If $p$ is prime, then $\forall a \in Z_{p}^*$ we have $a^{p-1} \equiv 1 \pmod{p}$.
Euler's Theorem: For all $n$ and forall $a \in Z_{n}^*$, we have $a^{\phi(n)} \equiv 1 \pmod{n}$, where $\phi(n)$ is the cardinality of the group so $\phi(n) = |\{a; 1 \leq a \leq n; gcd(a,n) = 1 \} |$.
For example: $Z_{10}^* = \{1, 3, 7, 9\}$. This is a multiplicative group, and $\phi(10) = 4$. Just to check, we have $3^{4} \equiv 1 \pmod{10}$.
Notice that the order of $a$ mod $n$ is the length of the periodicity of $a^{i}$ for $i = \{1,2,3,\ldots\}$. We notice that $order_{n}(a) | |Z_{n}^*|$ so that the order always divides the size of the group.
\subsection{Generators}
Notation: define $<a> = \{a^i, i \geq 1 \}$. We have $order_{n}(a) = |<a>|$.
Definition: If $order_{n}(g) = |Z_{n}^*|$, then $g$ is called the generator of $Z_{n}^*$. In other words $<g> = Z_{n}^*$.
Theorem: $Z_{n}^*$ has a generator (i.e. $Z_{n}^*$ is cyclic) if and only $n = 2,4, p^m, 2p^m$ for prime $p$ and $m \geq 1$.
Theorem: If $p$ is prime, then the number of generators mod $p$ is $\phi(p-1)$. For example $p = 11$ then $|Z_{p}^*| = 10$, and we have $\phi(10) = 4$.
Theorem: If $p$ is prime and $g$ is a generator modulo $p$, then $g^{x} \equiv y \pmod{p}$ has a unique solution for $0 \leq x < p -1$ for each $y \in Z_p^*$. We call $x$ the discrete logairthm of $y$, base $g$ modulo $p$.
The discrete logarithm problem is the problem of computing $x$ from $y$ given some $p$ and $g$. It is assumed that this is hard, no one has found an algorithm which solves this problem quickly.
\subsection{Generate and Test}
Randomly pick $g$ in $Z_{p}^*$ and test its order. If the order is $p-1$, then $g$ is the generator. Fermat's Little Theorem implies that $g^{p-1} = 1 \pmod{p}$ and $g^{d} \not \equiv 1 \pmod{p}$ for $d$ which is a divisor of $p-1$.
Assume we factored $p-1 = q_1^{e_1} q_2^{e_2} \ldots q_k^{e_k}$. Check that $g^{(p-1)/q} \not \equiv 1 \pmod{p}$ for each $q | (p-1)$.
\section{Public Keys}
Idea: Pick a large ``random'' $p$ such that we know the factorization of $p-1$. This is because factoring $p-1$ is hard.
If $p$ and $q$ are both primes and $p = 2q+1$, then $p$ is called a safe prime and $q$ is called a Sophie Germain prime. The factorization of $p-1$ is just $2q+1-1$ so that factorization is $p-1 = 2 \times q$. To get pairing, we find $q$ first which is a prime, then let $p = 2q + 1$ and we test $p$ for primality.
Theorem: If $p$ is a safe prime $p = 2q+1$ ($p-1 = 2-q$) then for all $a \in Z_p^*$ we have $order(a) \in \{1, 2, q, 2q\}$.
To test of $g$ is a generator we know that we check if $g^{p-1} \equiv 1 \pmod{p}$ and $g^2 \not \equiv 1 \pmod{p}$ and $g^q \not \equiv 1 \pmod{p}$, which implies that $order_p(g) = p-1$ and so $g$ is a generator.
If $p = 2q + 1$ and so $p$ is a safe prime then the number of generators mod $p$ is $\phi(p-1) = q-1$.
Find large random prime $q$. Let $p = 2q + 1$ and test $p$ for primality. If not, loop and pick new $q$, otherwise output $p,q,g$. We can find $g$ by picking $g \in Z_{p}^*$ and test for $g$ being a generator. Since there are a lot of generators, we've got high probability of getting a generator.