Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Symmetric Channels - Information Theory - Lecture Slides, Slides of Information Technology

Information Technology course teaches a lot we need to know in current world. These lecture slides include: Symmetric Channels, Error, Joint Aep, Channel Coding Theorem, Sequence of Codes, Feedback Codes for Dmc, First Inequality

Typology: Slides

2013/2014

Uploaded on 01/31/2014

dhanvin
dhanvin 🇮🇳

4.2

(14)

100 documents

1 / 160

Toggle sidebar

Related documents


Partial preview of the text

Download Symmetric Channels - Information Theory - Lecture Slides and more Slides Information Technology in PDF only on Docsity! EE514a – Information Theory I Fall Quarter 2013 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/ Lecture 15 - Nov 19th, 2013 Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F1/48 (pg.1/160) Logistics Review Class Road Map - IT-I L1 (9/26): Overview, Communications, Information, Entropy L2 (10/1): Props. Entropy, Mutual Information, L3 (10/3): KL-Divergence, Convex, Jensen, and properties. L4 (10/8): Data Proc. Ineq., thermodynamics, Stats, Fano, M. of Conv L5 (10/10): AEP, Compression L6 (10/15): Compression, Method of Types, L7 (10/17): Types, U. Coding., Stoc. Processes, Entropy rates, L8 (10/22): Entropy rates, HMMs, Coding, Kraft, L9 (10/24): Kraft, Shannon Codes,Huffman, Shannon/Fano/Elias L10 (10/28): Huffman, Shannon/Fano/Elias L11 (10/29): Shannon Games, LXX (10/31): Midterm, in class. L12 (11/7): Arith. Coding, Channel Capacity L13 (11/12): Channel Capacity L14 (11/14): Channel Capacity L15 L16 L17 L18 L19 Finals Week: December 12th–16th. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F2/48 (pg.2/160) Logistics Review Announcements Office hours, every week, now Thursdays 4:30-5:30pm. Can also reach me at that time via a canvas conference. Final assignment, need to upload pdf scan we send you. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F5/48 (pg.5/160) Logistics Review Symmetric Channels Definition 15.2.1 A channel is symmetric if rows of the channel transmission matrix p(y|x) are permutations of each other, and columns of this matrix are permutations of each other. A channel is weakly symmetric if every row of the matrix is a permutation of every other row, and all column sums∑ x p(y|x) are equal. Theorem 15.2.2 For weakly symmetric channels, we have that C = log |Y| −H(r) (15.1) where r is the row of the transmission matrix. This follows immediately since I(X;Y ) = H(Y )−H(Y |X) = H(Y )−H(r) ≤ log |Y| −H(r) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F6/48 (pg.6/160) Logistics Review Properties of (Information) Channel Capacity C C ≥ 0 since I(X;Y ) ≥ 0. C ≤ log |X | since C = maxp(x) I(X;Y ) ≤ maxH(X) = log |X |. C ≤ log |Y| for same reason. Thus, the alphabet sizes can limit the transmission rate. I(X;Y ) = Ip(x)(X;Y ) is a continuous function of p(x). Recall, I(X;Y ) is a concave function of p(x) for fixed p(y|x). Thus, Iλp1+(1−λ)p2(X;Y ) ≥ λIp1(X;Y ) + (1− λ)Ip2(X;Y ). Interestingly, since concave, this makes computing something like the capacity easier. I.e., a local maximum is a global maximum, and computing the capacity for a general channel model is a convex optimization procedure. Recall also, I(X;Y ) is a convex function of p(y|x) for fixed p(x). Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F7/48 (pg.7/160) Logistics Review (M,n) code Definition 15.2.2 ((M,n) code) An (M,n) code for channel (X , p(y|x),Y) is: 1 An index set {1, 2, . . . ,M} 2 An encoding function Xn : {1, 2, . . . ,M} → X n yielding codewords Xn(1), Xn(2), Xn(3), . . . , Xn(M). Each source message has a codeword, and each codeword is n code symbols. 3 Decoding function, i.e., g : Yn → {1, 2, . . . ,M} which makes a “guess” about original message given channel output. In an (M,n) code, M = the number of possible messages to be sent, and n = number of channel uses by the codewords of the code. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F10/48 (pg.10/160) Logistics Review Error Definition 15.2.2 (Probability of Error λi for message i ∈ {1, . . . ,M}) λi , Pr(g(Y n) 6= i|Xn = Xn(i)) = ∑ yn∈Yn p(yn|Xn(i))1(g(yn) 6= i) (15.3) Definition 15.2.3 (Max probability of Error λ(n) for (M,n) code) λ(n) , max i∈{1,2,...,M} λi (15.4) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F11/48 (pg.11/160) Logistics Review Error Definition 15.2.2 (Average probabiltiy of error P (n) e ) P (n)e = 1 M M∑ i=1 λi = Pr(I 6= g(Y n)) (15.3) where I is a r.v. with probability Pr(I = i) according to a uniform source distribution . . . = E(1(I 6= g(Y n))) = M∑ i=1 Pr(g(Y n) 6= i|Xn = Xn(i))p(i) (15.4) with p(i) = 1/M . A key Shannon’s result is that a small average probability of error means we must have a small maximum probability of error! Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F12/48 (pg.12/160) Logistics Review Joint Typicality Definition 15.2.2 (Joint typicality of a set of sequences) A set of sequences {(x1:n, y1:n)} w.r.t. p(x, y) is jointly typical (∈ A(n) ) as per the following definition: A(n) = { (xn, yn) ∈ X n × Yn : (15.3) a) ∣∣∣∣− 1n log p(xn)−H(X) ∣∣∣∣ < , x-typical (15.4) b) ∣∣∣∣− 1n log p(yn)−H(Y ) ∣∣∣∣ < , y-typical (15.5) and c) ∣∣∣∣− 1n log p(xn, yn)−H(X,Y ) ∣∣∣∣ < , (x, y)-typical} (15.6) with p(xn, yn) = ∏n i=1 p(xi, yi). Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F15/48 (pg.15/160) Logistics Review Joint AEP Theorem 15.2.2 Let (Xn, Y n) ∼ p(xn, yn) =∏ni=1 p(xi, yi). Then 1 Pr ( (Xn, Y n) ∈ A(n) ) → 1 as n→∞. 2 |A(n) | ≤ 2n(H(X,Y )+) and (1− )2n(H(X,Y )−) ≤ |A(n) |. 3 If (X̃n, Ỹ n) ∼ p(xn)p(yn) are drawn independently, then Pr ( (X̃n, Ỹ n) ∈ A(n) ) ≤ 2−n(I(X;Y )−3) (15.6) and for sufficiently large n, we have Pr ( (X̃n, Ỹ n) ∈ A(n) ) ≥ (1− )2−n(I(X;Y )+3) (15.7) Key property: we have bound on the probability of independently drawn sequences being jointly typical, falls off exponentially fast with n, if I(X;Y ) > 0. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F16/48 (pg.16/160) Logistics Review Channel Coding Theorem (Shannon 1948): more formally Theorem 15.2.2 All rates below C , maxp(x) I(X;Y ) are achievable. Specifically, ∀R < C, there exists a sequence of (2nR, n) codes with maximum probability of error λ(n) → 0 as n→∞. Conversely, any (2nR, n) sequence of codes with λ(n) → 0 as n→∞ must have that R < C. Implications: as long as we do not code above capacity we can, for all intents and purposes, code with zero error. This is true for all noisy channels representable under this model. We’re talking about discrete channels now, but we generalize this to continuous channels in the coming weeks. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F17/48 (pg.17/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Example: intuition as to how this becomes β. ω x1 x2 x3 x4 y1 y2 y3 y4 4 possible codewords These are the possible associations between ω and one of the codewords. Considering all associations, we have the same average error for each ω. Thus, we just choose ω=1. So error is equal to: prob. of choosing x1 for ω and not choosing y1 +prob. of choosing x2 for ω and not choosing y2 + . . . (15.1) this is just the same for all ω ∈ {1, 2, . . . ,M} so we may just pick ω = 1. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F18/48 (pg.20/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Example: intuition as to how this becomes β. ω x1 x2 x3 x4 y1 y2 y3 y4 4 possible codewords These are the possible associations between ω and one of the codewords. Considering all associations, we have the same average error for each ω. Thus, we just choose ω=1. So error is equal to: prob. of choosing x1 for ω and not choosing y1 +prob. of choosing x2 for ω and not choosing y2 + . . . (15.1) this is just the same for all ω ∈ {1, 2, . . . ,M} so we may just pick ω = 1. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F18/48 (pg.21/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Example: intuition as to how this becomes β. ω x1 x2 x3 x4 y1 y2 y3 y4 4 possible codewords These are the possible associations between ω and one of the codewords. Considering all associations, we have the same average error for each ω. Thus, we just choose ω=1. So error is equal to: prob. of choosing x1 for ω and not choosing y1 +prob. of choosing x2 for ω and not choosing y2 + . . . (15.1) this is just the same for all ω ∈ {1, 2, . . . ,M} so we may just pick ω = 1. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F18/48 (pg.22/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get Pr(E) = ∑ C Pr(C)P (n)e (C) = 1 2nR 2nR∑ ω=1 β = ∑ C Pr(C)λ1(C) = β (15.2) with β = Pr(E|W = 1). Next, define the random events (again considering ω = 1): Ei , { (xn(i), yn) ∈ A(n) } for i = 1, . . . , 2nR (15.3) Assume that input is xn(1) (i.e., first message sent). Then the no error event is the same as: E1 ∩¬(E2 ∪E3 ∪ · · · ∪EM ). . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F19/48 (pg.25/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get Pr(E) = ∑ C Pr(C)P (n)e (C) = 1 2nR 2nR∑ ω=1 β = ∑ C Pr(C)λ1(C) = β (15.2) with β = Pr(E|W = 1). Next, define the random events (again considering ω = 1): Ei , { (xn(i), yn) ∈ A(n) } for i = 1, . . . , 2nR (15.3) Assume that input is xn(1) (i.e., first message sent). Then the no error event is the same as: E1 ∩¬(E2 ∪E3 ∪ · · · ∪EM ). . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F19/48 (pg.26/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get Pr(E) = ∑ C Pr(C)P (n)e (C) = 1 2nR 2nR∑ ω=1 β = ∑ C Pr(C)λ1(C) = β (15.2) with β = Pr(E|W = 1). Next, define the random events (again considering ω = 1): Ei , { (xn(i), yn) ∈ A(n) } for i = 1, . . . , 2nR (15.3) Assume that input is xn(1) (i.e., first message sent). Then the no error event is the same as: E1 ∩¬(E2 ∪E3 ∪ · · · ∪EM ). . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F19/48 (pg.27/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Various flavors of error Ec1 means that the transmitted and received codeword are not jointly typical (this is error type B from before). E2 ∪ E3 ∪ · · · ∪ E2nR . This is either: Type C: wrong codeword is jointly typical with received sequence Type A: greater than 1 codeword is jointly typical with received sequence so this is type C and A both. Our goal is to bound the probability of error, but lets look at some figures first. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F20/48 (pg.30/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Various flavors of error Ec1 means that the transmitted and received codeword are not jointly typical (this is error type B from before). E2 ∪ E3 ∪ · · · ∪ E2nR . This is either: Type C: wrong codeword is jointly typical with received sequence Type A: greater than 1 codeword is jointly typical with received sequence so this is type C and A both. Our goal is to bound the probability of error, but lets look at some figures first. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F20/48 (pg.31/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Various flavors of error Ec1 means that the transmitted and received codeword are not jointly typical (this is error type B from before). E2 ∪ E3 ∪ · · · ∪ E2nR . This is either: Type C: wrong codeword is jointly typical with received sequence Type A: greater than 1 codeword is jointly typical with received sequence so this is type C and A both. Our goal is to bound the probability of error, but lets look at some figures first. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F20/48 (pg.32/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. 2n H (X |Y ) 2nH(Y |X) 2nH(Y ) 2 n H (X ) X n Yn set of all pairs of marginally typical sequences set of all jointly typical pairs of sequences 2nH(X,Y ) { { . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F21/48 (pg.35/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. 2nH(Y ) 2 n H (X ) X n Yn Subset selection of the 2nR random X n codewords (chosen by the random selection procedure) for i = 1, 2, . . . ,M . Here, 2nR = M = 4. x(1) x(2) x(3) x(4) Dots are the jointly typical sequences Vertical axis is lexicographic order of possible codewords . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F22/48 (pg.36/160) Proof that all rates R < C are achievable. X n x(1) x(3) x(4) x(2) y(a ), o n se ndi ng x(∗) y(d ), o n se ndi ng x(1 ) y(b ), o n se ndi ng x(4 ) y(c ), o n se ndi ng x(1 ) g(y(a)) = 0 g(y(d)) = 4 g(y(b)) = 4 g(y(c)) = 0 Ec1 E2 ∪ E3 ∪ . . . E2 ∪ E3 ∪ . . . y(a) not jointly typical with any of the sent codewords. Error type B y(b) is jointly typical only with x(4), so no error y(c) is jointly typical with both x(1) and x(3), so Error type A y(d) should not be jointly typical with x(4) but it is. Wrong jointly typical sequence. Error type C. . . . Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Also, because of random code generation process (and recall, ω = 1) Xn(1)⊥Xn(i)⇒ Y n⊥Xn(i), for i 6= 1 (15.8) This gives, for i 6= 1, Pr((Xn(i), Y n)︸ ︷︷ ︸ indep. events ∈ A(n) ) ≤ 2−n(I(X;Y )−3) (15.9) by the joint AEP. This will allow us to bound the error, as long as I(X;Y ) > 3. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F25/48 (pg.40/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Also, because of random code generation process (and recall, ω = 1) Xn(1)⊥Xn(i)⇒ Y n⊥Xn(i), for i 6= 1 (15.8) This gives, for i 6= 1, Pr((Xn(i), Y n)︸ ︷︷ ︸ indep. events ∈ A(n) ) ≤ 2−n(I(X;Y )−3) (15.9) by the joint AEP. This will allow us to bound the error, as long as I(X;Y ) > 3. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F25/48 (pg.41/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get: Pr(E) = Pr(E|W = 1) ≤ Pr(Ec1) + 2nR∑ i=2 Pr(Ei) (15.10) ≤ + 2nR∑ i=2 2−n(I(X;Y )−3) (15.11) = + (2nR − 1)2−n(I(X;Y )−3) (15.12) ≤ + 23n2−n(I(X;Y )−R) (15.13) = + 2−n((I(X;Y )−3)−R) (15.14) ≤ 2 for large enough n (15.15) The last statement is true only if I(X;Y )− 3 > R. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F26/48 (pg.42/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get: Pr(E) = Pr(E|W = 1) ≤ Pr(Ec1) + 2nR∑ i=2 Pr(Ei) (15.10) ≤ + 2nR∑ i=2 2−n(I(X;Y )−3) (15.11) = + (2nR − 1)2−n(I(X;Y )−3) (15.12) ≤ + 23n2−n(I(X;Y )−R) (15.13) = + 2−n((I(X;Y )−3)−R) (15.14) ≤ 2 for large enough n (15.15) The last statement is true only if I(X;Y )− 3 > R. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F26/48 (pg.45/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get: Pr(E) = Pr(E|W = 1) ≤ Pr(Ec1) + 2nR∑ i=2 Pr(Ei) (15.10) ≤ + 2nR∑ i=2 2−n(I(X;Y )−3) (15.11) = + (2nR − 1)2−n(I(X;Y )−3) (15.12) ≤ + 23n2−n(I(X;Y )−R) (15.13) = + 2−n((I(X;Y )−3)−R) (15.14) ≤ 2 for large enough n (15.15) The last statement is true only if I(X;Y )− 3 > R. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F26/48 (pg.46/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get: Pr(E) = Pr(E|W = 1) ≤ Pr(Ec1) + 2nR∑ i=2 Pr(Ei) (15.10) ≤ + 2nR∑ i=2 2−n(I(X;Y )−3) (15.11) = + (2nR − 1)2−n(I(X;Y )−3) (15.12) ≤ + 23n2−n(I(X;Y )−R) (15.13) = + 2−n((I(X;Y )−3)−R) (15.14) ≤ 2 for large enough n (15.15) The last statement is true only if I(X;Y )− 3 > R. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F26/48 (pg.47/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So we get: Pr(E) = Pr(E|W = 1) ≤ Pr(Ec1) + 2nR∑ i=2 Pr(Ei) (15.10) ≤ + 2nR∑ i=2 2−n(I(X;Y )−3) (15.11) = + (2nR − 1)2−n(I(X;Y )−3) (15.12) ≤ + 23n2−n(I(X;Y )−R) (15.13) = + 2−n((I(X;Y )−3)−R) (15.14) ≤ 2 for large enough n (15.15) The last statement is true only if I(X;Y )− 3 > R. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F26/48 (pg.50/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So if we chose R < I(X;Y ) (strictly), we can find an  and n so that the average probability of error Pr(E) ≤ 2, can be made as small as we want. But, we need to get from an average to a max probability of error, and bound that. First, choose p∗(x) = argmaxp(x) I(X;Y ) rather than uniform p(x), to change the condition from R < I(X;Y ) to R < C. Thus, this gives us higher rate limit. If Pr(E) ≤ 2, the bound on the average error is small, so there must exist some specific code, say C∗ s.t. P (n)e (C∗) ≤ 2 (15.16) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F27/48 (pg.51/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. So if we chose R < I(X;Y ) (strictly), we can find an  and n so that the average probability of error Pr(E) ≤ 2, can be made as small as we want. But, we need to get from an average to a max probability of error, and bound that. First, choose p∗(x) = argmaxp(x) I(X;Y ) rather than uniform p(x), to change the condition from R < I(X;Y ) to R < C. Thus, this gives us higher rate limit. If Pr(E) ≤ 2, the bound on the average error is small, so there must exist some specific code, say C∗ s.t. P (n)e (C∗) ≤ 2 (15.16) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F27/48 (pg.52/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Lets break apart the this error probability. P (n)e (C∗) = 1 2nR 2nR∑ i=1 λi(C∗) (15.17) = 1 2nR ∑ i:λi<4 λi(C∗) + 1 2nR ∑ i:λi≥4 λi(C∗) (15.18) ≤ 2 (15.19) Now suppose more than half of the indices had error ≥ 4 (i.e., suppose |{i : λi ≥ 4}| > 2nR/2). Under this assumption: 1 2nR ∑ i:λi≥4 λi ≥ 1 2nR ∑ i:λi≥4 4 = 1 2nR |{i : λi ≥ 4}|4 > 1 2 4 = 2 . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F28/48 (pg.55/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Lets break apart the this error probability. P (n)e (C∗) = 1 2nR 2nR∑ i=1 λi(C∗) (15.17) = 1 2nR ∑ i:λi<4 λi(C∗) + 1 2nR ∑ i:λi≥4 λi(C∗) (15.18) ≤ 2 (15.19) Now suppose more than half of the indices had error ≥ 4 (i.e., suppose |{i : λi ≥ 4}| > 2nR/2). Under this assumption: 1 2nR ∑ i:λi≥4 λi ≥ 1 2nR ∑ i:λi≥4 4 = 1 2nR |{i : λi ≥ 4}|4 > 1 2 4 = 2 . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F28/48 (pg.56/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Lets break apart the this error probability. P (n)e (C∗) = 1 2nR 2nR∑ i=1 λi(C∗) (15.17) = 1 2nR ∑ i:λi<4 λi(C∗) + 1 2nR ∑ i:λi≥4 λi(C∗) (15.18) ≤ 2 (15.19) Now suppose more than half of the indices had error ≥ 4 (i.e., suppose |{i : λi ≥ 4}| > 2nR/2). Under this assumption: 1 2nR ∑ i:λi≥4 λi ≥ 1 2nR ∑ i:λi≥4 4 = 1 2nR |{i : λi ≥ 4}|4 > 1 2 4 = 2 . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F28/48 (pg.57/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Can’t be since these alone would be more than our 2 upper bound. Hence, at most half the codewords can have error ≥ 4, and we get |{i : λi ≥ 4}| ≤ 2nR 2 ⇒ |{i : λi < 4}| ≥ 2nR 2 (15.20) Create a new codebook that eliminates all bad codewords (i.e., those in with index {i : λi ≥ 4}). There are at most half of them. The remaining codewords are of size ≥ 2nR/2 = 2nR−1 = 2n(R−1/n) (at least half of them). They all have max probability ≤ 4. We now code with rate R′ = R− 1/n→ R as n→∞, but for this new sequence of codes, the max error probability λ(n) ≤ 4, which can be made as small as we wish. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F29/48 (pg.60/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Can’t be since these alone would be more than our 2 upper bound. Hence, at most half the codewords can have error ≥ 4, and we get |{i : λi ≥ 4}| ≤ 2nR 2 ⇒ |{i : λi < 4}| ≥ 2nR 2 (15.20) Create a new codebook that eliminates all bad codewords (i.e., those in with index {i : λi ≥ 4}). There are at most half of them. The remaining codewords are of size ≥ 2nR/2 = 2nR−1 = 2n(R−1/n) (at least half of them). They all have max probability ≤ 4. We now code with rate R′ = R− 1/n→ R as n→∞, but for this new sequence of codes, the max error probability λ(n) ≤ 4, which can be made as small as we wish. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F29/48 (pg.61/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Can’t be since these alone would be more than our 2 upper bound. Hence, at most half the codewords can have error ≥ 4, and we get |{i : λi ≥ 4}| ≤ 2nR 2 ⇒ |{i : λi < 4}| ≥ 2nR 2 (15.20) Create a new codebook that eliminates all bad codewords (i.e., those in with index {i : λi ≥ 4}). There are at most half of them. The remaining codewords are of size ≥ 2nR/2 = 2nR−1 = 2n(R−1/n) (at least half of them). They all have max probability ≤ 4. We now code with rate R′ = R− 1/n→ R as n→∞, but for this new sequence of codes, the max error probability λ(n) ≤ 4, which can be made as small as we wish. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F29/48 (pg.62/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback All rates R < C are achievable. Proof that all rates R < C are achievable. Can’t be since these alone would be more than our 2 upper bound. Hence, at most half the codewords can have error ≥ 4, and we get |{i : λi ≥ 4}| ≤ 2nR 2 ⇒ |{i : λi < 4}| ≥ 2nR 2 (15.20) Create a new codebook that eliminates all bad codewords (i.e., those in with index {i : λi ≥ 4}). There are at most half of them. The remaining codewords are of size ≥ 2nR/2 = 2nR−1 = 2n(R−1/n) (at least half of them). They all have max probability ≤ 4. We now code with rate R′ = R− 1/n→ R as n→∞, but for this new sequence of codes, the max error probability λ(n) ≤ 4, which can be made as small as we wish. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F29/48 (pg.65/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Discussion To summarize, random coding is the method of proof to show that if R < C, there exists a sequence of (2nR, n) codes with λ(n) → 0 as n→∞. This might not be the best code, but it is sufficient. It is an existence proof. Huge literature on coding theory. We’ll discuss Hamming codes. But many good codes exist today: Turbo codes, Gallager (or low-density-parity-check) codes, and new ones are being proposed often. Perhaps if there is enough demand, we’ll have a quarter class just on coding theory. But we have yet to prove the converse . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F30/48 (pg.66/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Discussion To summarize, random coding is the method of proof to show that if R < C, there exists a sequence of (2nR, n) codes with λ(n) → 0 as n→∞. This might not be the best code, but it is sufficient. It is an existence proof. Huge literature on coding theory. We’ll discuss Hamming codes. But many good codes exist today: Turbo codes, Gallager (or low-density-parity-check) codes, and new ones are being proposed often. Perhaps if there is enough demand, we’ll have a quarter class just on coding theory. But we have yet to prove the converse . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F30/48 (pg.67/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Discussion To summarize, random coding is the method of proof to show that if R < C, there exists a sequence of (2nR, n) codes with λ(n) → 0 as n→∞. This might not be the best code, but it is sufficient. It is an existence proof. Huge literature on coding theory. We’ll discuss Hamming codes. But many good codes exist today: Turbo codes, Gallager (or low-density-parity-check) codes, and new ones are being proposed often. Perhaps if there is enough demand, we’ll have a quarter class just on coding theory. But we have yet to prove the converse . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F30/48 (pg.70/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Discussion To summarize, random coding is the method of proof to show that if R < C, there exists a sequence of (2nR, n) codes with λ(n) → 0 as n→∞. This might not be the best code, but it is sufficient. It is an existence proof. Huge literature on coding theory. We’ll discuss Hamming codes. But many good codes exist today: Turbo codes, Gallager (or low-density-parity-check) codes, and new ones are being proposed often. Perhaps if there is enough demand, we’ll have a quarter class just on coding theory. But we have yet to prove the converse . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F30/48 (pg.71/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Discussion We next need to show that any sequence of (2nR, n) codes with λ(n) → 0 must have that R ≤ C. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F31/48 (pg.72/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.75/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.76/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.77/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.80/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.81/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.82/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.85/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.86/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.87/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (15.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (15.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (15.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (15.24) = n∑ i=1 I(Yi;Xi) ≤ nC (15.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F32/48 (pg.90/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes Thus, nR ≤ nC and R ≤ C when P (n)e = 0. In fact, the proof shows H(W ) ≤ nC, which means that maxpHp(W ) ≤ nC implying that H(W ) ≤ max p Hp(W ) = nR ≤ nC (15.26) so we get R ≤ C regardless of the source distribution. It also shows a sub-lemma, namely that I(Xn;Y n) ≤ nC that we’ll use later. Lets name it: Lemma 15.4.1 I(Xn;Y n) ≤ nC (15.27) We also need Fano’s inequality. Recall, before it took the form H(X|Y ) ≤ 1 + Pe logX (15.28) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F33/48 (pg.91/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes Thus, nR ≤ nC and R ≤ C when P (n)e = 0. In fact, the proof shows H(W ) ≤ nC, which means that maxpHp(W ) ≤ nC implying that H(W ) ≤ max p Hp(W ) = nR ≤ nC (15.26) so we get R ≤ C regardless of the source distribution. It also shows a sub-lemma, namely that I(Xn;Y n) ≤ nC that we’ll use later. Lets name it: Lemma 15.4.1 I(Xn;Y n) ≤ nC (15.27) We also need Fano’s inequality. Recall, before it took the form H(X|Y ) ≤ 1 + Pe logX (15.28) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F33/48 (pg.92/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Zero Error Codes Thus, nR ≤ nC and R ≤ C when P (n)e = 0. In fact, the proof shows H(W ) ≤ nC, which means that maxpHp(W ) ≤ nC implying that H(W ) ≤ max p Hp(W ) = nR ≤ nC (15.26) so we get R ≤ C regardless of the source distribution. It also shows a sub-lemma, namely that I(Xn;Y n) ≤ nC that we’ll use later. Lets name it: Lemma 15.4.1 I(Xn;Y n) ≤ nC (15.27) We also need Fano’s inequality. Recall, before it took the form H(X|Y ) ≤ 1 + Pe logX (15.28) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F33/48 (pg.95/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Fano’s Lemma (needed for proof) Theorem 15.4.2 (Fano) For a DMC with codebook C and uniformly distributed input messages (H(W ) = nR) and P (n) e = Pr(W 6= g(Y n)), then H(Xn|Y n) ≤ 1 + P (n)e nR (15.29) Proof. Let E , 1 { W 6= Ŵ } . Then we get: H(E,W |Y n) = H(W |Y n) + =0︷ ︸︸ ︷ H(E|Y n,W ) (15.30) = H(E|Y n)︸ ︷︷ ︸ ≤1 +H(W |Y n, E) (15.31) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F34/48 (pg.96/160) Shannon’s 2nd Theorem Zero Error Codes 2nd Thm Conv. Zero Error, R = C Feedback Fano’s Lemma (needed for proof) Theorem 15.4.2 (Fano) For a DMC with codebook C and uniformly distributed input messages (H(W ) = nR) and P (n) e = Pr(W 6= g(Y n)), then H(Xn|Y n) ≤ 1 + P (n)e nR (15.29) Proof. Let E , 1 { W 6= Ŵ } . Then we get: H(E,W |Y n) = H(W |Y n) + =0︷ ︸︸ ︷ H(E|Y n,W ) (15.30) = H(E|Y n)︸ ︷︷ ︸ ≤1 +H(W |Y n, E) (15.31) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 15 - Nov 19th, 2013 L15 F34/48 (pg.97/160)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved