Download Lecture Slides on Information Theory - Introduction to Cryptography | CIS 400 and more Study notes Computer Science in PDF only on Docsity! INFORMATION THEORY CIS 400/628 — Spring 2005 Introduction to Cryptography This is based on Chapter 15 of Trappe and Washington SHANNON’S INFORMATION THEORY I Late 1940s. I Concerned with the amount of information, not whether it is informative. I Typical Problem: How much can we compress a message and be able to reconstruct from the compressed version? I Focus is on collections of messages and probabilities on them. So common messages get short encodings and uncommon ones get longer encodings. — 1 — PROBABILITY REVIEW CONTINUED DEFINITION Suppose (X, pX) is a probability space and S : X → Y . S is called a Y -valued random variable on X & for y ∈ Y pS(y) =def pX({ x ∈ X S(x) = y }) = Prob[S = y] DEFINITION Suppose (X, pX) is a probability space and S : X → Y and T : X → Z are random variables. Then pS,T (y, z) =def pX({ x ∈ X S(x) = y & T (x) = z }) = Prob[S = y, T = z]. EXAMPLE X = { 1, . . . , 6 } S : X → { 0, 1 } T : X → { 0, 1 }. S(x) = { 1, if x is even; 0, if x is odd. T (x) = { 1, if x < 3; 0, if x ≥ 3. — 4 — STILL MORE PROBABILITY DEFINITION S : X → Y and T : X → Z are independent iff for all y ∈ Y, z ∈ Z: Prob[S = y, T = z] = Prob[S = y] · Prob[T = z]. EXAMPLE I S : { 1, . . . , 6 } → { 0, 1 }. S(x) = 1 ⇐⇒ x is even. I T : { 1, . . . , 6 } → { 0, 1 }. T (x) = 1 ⇐⇒ x < 3. I U : { 1, . . . , 6 } → { 0, 1 }. U(x) = 1 ⇐⇒ x is prime. S and T are independent. S and U are not independent. DEFINITION Suppose S : X → Y , T : X → Z, and Prob[T = z] > 0. Then the conditional probability of y given z is Prob[S = y|T = z] =def Prob[S = y, T = z] Prob[T = z] . Sometimes Prob[S = y|T = z] is written pY (y|z). — 5 — BAYES’S THEOREM Note: If S and T are independent, then Prob[S = y|T = z] = Prob[S = y]. Bayes’s Theorem If Prob[S = y] > 0 and Prob[T = z] > 0, then Prob[S = y|T = z] = Prob[S = y] · Prob[T = z|S = y] Prob[T = z] . proof on board — 6 — EXAMPLE APPLICATIONS EXAMPLE: A fair coin X = ({ heads, tails }, p(heads) = p(tails) = 1 2 . H(X) = −1 · (1 2 log2 1 2 + 1 2 log2 1 2 ) = −(−1 2 − 1 2 ) = 1. It takes 1 bit to descibe the outcome. Example: An unfair coin Suppose 0 < p < 1. Prob[heads] = p Prob[tails] = 1 − p. H(unfair coin toss) = −p · log2 p − (1 − p) · log2(1 − p). Example: A fair n-sided die H(a roll) = −1 n log2 1 n − · · · − 1 n log2 1 n = log2 n. Example: Flipping two fair coins Heads: no points. Tails: 1 point. Two flips: sum points. Outcomes: 0, 1, 2 with probabilities: 1 4 , 1 2 , 1 4 H(two coin flips) = −1 4 log2 1 4 − 1 2 log2 1 2 − 1 4 log2 1 4 = 3 2 . = the avg. number of yes/no quesions needed to tell the result Is there exactly one head? Are there two heads? — 9 — JOINT AND CONDITIONAL ENTROPY Suppose S : X → Y, T : X → Z, and U : X → Y × Z where U(x) = (S(x), T (x)). H(S, T ) =def − ∑ x∈X ∑ y∈Y pX,Y (x, y) · log2 pX,Y (x, y). This is just the entropy of U . We define conditional entropy of T given S by: H(T |S) =def ∑ y pS(y) · H(T |S = y) = − ∑ y pS(y) (∑ z pT (z|y) · log2 pT (z|y) ) = − ∑ y ∑ z pS,T (y, z) log2 pT (z|y) (since pS,T (y, z) = pT (z|y)pS(y)). = the uncertainty of T given S — 10 — JOINT AND CONDITIONAL ENTROPY, CONTINUED CHAIN RULE THEOREM. H(X, Y ) = H(X) + H(Y |X). The uncertainty of (X, Y ) = the uncertainty of X + the uncertainty of Y , given that X happened. THEOREM. a. H(X) ≤ log2 |X| — equal iff all elms of X equally likely You are most uncertain under uniform distrs. b. H(X, Y ) ≤ H(X) + H(Y ). The info in (X, Y ) is at most the info of X + the info of Y . c. H(Y |X) ≤ H(Y ). Knowing X cannot make you less certain about Y . = only if X, Y independent. Proof of c. By the Chain Rule: H(X) + H(Y |X) = H(X, Y ). By b: H(X, Y ) ≤ H(X) + H(Y ). So, H(X) + H(Y |X) ≤ H(X) + H(Y ). — 11 — PERFECT SECRECY GOAL: Use inf. theory to explain how one-time pads provide “perfect secrecy”. P: plaintexts each with a certain probability C: ciphertexts induced probabilities K: keys assume independent of choice of plaintext EXAMPLE P = { a, b, c } Prob[a] = 0.5 Prob[b] = 0.3 Prob[c] = 0.2 K = { k1, k2 } Prob[k1] = 0.5 Prob[k2] = 0.5 C = { U, V, W }. eK(x) a b c k1 U V W k2 U W V Prob[U ] = 0.5 Prob[V ] = 0.25 Prob[W ] = 0.25 What can Eve learn from an intercepted ciphertext? — 14 — PERFECT SECRECY, CONTINUED DEFINITION A cryptosystem has perfect secrecy iff H(P |C) = H(P ). THEOREM The one-time pad has perfect secrecy. Proof Setup I z = size of alphabet, e.g., 2, 26, 256, etc. I P = strings of length L (zL many) I K = (s1, . . . , sL) = vector of shifts, each key k, pK(k) = z−L. I C = P I c ∈ C I pC(c) = ∑ { ProbP (x) · ProbK(k) : x ∈ P, k ∈ K, ek(x) = c } (Since P and K are independent, Prob[P = x, K = k] = ProbP (x) · ProbK(k).) — 15 — PROOF CONTINUED pC(c) = ∑ { ProbP(x) · ProbK(k) : x ∈ P, k ∈ K, ek(x) = c } = z−L ∑ { ProbP(x) : x ∈ P, k ∈ K, ek(x) = c } Obs: Given p and c, there is only one k such that ek(x) = c. So:∑ { ProbP(x) : x ∈ P, k ∈ K, ek(x) = c } = 1. Therefore, PC(c) = z−L. H(K) = H(C) = log2(z L). H(P, K, C) = H(P, K) = H(P ) + H(K). P and K indep. H(P, K, C) = H(P, C) = H(P |C) + H(C). H(P ) + H(K) = H(P |C) + H(C). ∴ H(P ) = H(P |C). QED For RSA, H(P |C) = 0. Why? — 16 — THE ENTROPY OF ENGLISH, III How to compute: HEnglish = limn→∞ H(L n)/n? Shannon’s Idea I First suppose you had an optimal “next letter guesser.” • Given a prefix, it ranks (from 1 to 26) the letters as being most likely to be next. i t i s s u n n y t o d a y 2 1 1 1 4 3 2 1 4 1 1 1 1 1 • Run a text through it and record what it guesses each letter corresponds to. • From the predictor + “21114321411111” we can recover the text. I Use a native English speaker the “next letter predictor” and gather stats (assume determinism). — 19 — THE ENTROPY OF ENGLISH, IV I Given a text + the sequence of guesses, let qi = the frequency of # i. I Shannon: .72 ≈ ∑26 i=1 i · (qi − qi+1) · log2 i ≤ HEnglish ≤ − ∑26 i=1 qi · log2 q1 ≈ 1.42. I Since Hrandom text = 4.18, (info in English):(info random text)::1:4 I So English is about 75% redundant. — 20 —