Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Zero Error Codes - Information Theory - Lecture Slides, Slides of Information Technology

Information Technology course teaches a lot we need to know in current world. These lecture slides include: Zero Error Codes, Channel Theorem, Data Compression, Data Transmission, Physical Solution to Improve Coding, Entropic Quantities, Density Function, Cumulative Distribution, Discrete Entropy, Multivariate Gaussian, Chain Rules, Monotonic Non Decreasing

Typology: Slides

2013/2014

Uploaded on 01/31/2014

dhanvin
dhanvin 🇮🇳

4.2

(14)

100 documents

1 / 277

Toggle sidebar

Related documents


Partial preview of the text

Download Zero Error Codes - Information Theory - Lecture Slides and more Slides Information Technology in PDF only on Docsity! EE514a – Information Theory I Fall Quarter 2013 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/ Lecture 16 - Nov 21st, 2013 Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F1/58 (pg.1/277) Logistics Review Class Road Map - IT-I L1 (9/26): Overview, Communications, Information, Entropy L2 (10/1): Props. Entropy, Mutual Information, L3 (10/3): KL-Divergence, Convex, Jensen, and properties. L4 (10/8): Data Proc. Ineq., thermodynamics, Stats, Fano, M. of Conv L5 (10/10): AEP, Compression L6 (10/15): Compression, Method of Types, L7 (10/17): Types, U. Coding., Stoc. Processes, Entropy rates, L8 (10/22): Entropy rates, HMMs, Coding, Kraft, L9 (10/24): Kraft, Shannon Codes,Huffman, Shannon/Fano/Elias L10 (10/28): Huffman, Shannon/Fano/Elias L11 (10/29): Shannon Games, LXX (10/31): Midterm, in class. L12 (11/7): Arith. Coding, Channel Capacity L13 (11/12): Channel Capacity L14 (11/14): Channel Capacity, Shannon’s 2nd thm L15 (11/19): Shannon’s 2nd thm, zero error codes, feedback L16 (11/21): Joint thm, coding, hamming, diff. entropy L17 L18 L19 Finals Week: December 12th–16th. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F2/58 (pg.2/277) Logistics Review Announcements Office hours, every week, now Thursdays 4:30-5:30pm. Can also reach me at that time via a canvas conference. Final assignment, need to upload pdf scan we send you. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F5/58 (pg.5/277) Logistics Review Channel Coding Theorem (Shannon 1948): more formally Theorem 16.2.2 All rates below C , maxp(x) I(X;Y ) are achievable. Specifically, ∀R < C, there exists a sequence of (2nR, n) codes with maximum probability of error λ(n) → 0 as n→∞. Conversely, any (2nR, n) sequence of codes with λ(n) → 0 as n→∞ must have that R < C. Implications: as long as we do not code above capacity we can, for all intents and purposes, code with zero error. This is true for all noisy channels representable under this model. We’re talking about discrete channels now, but we generalize this to continuous channels in the coming weeks. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F6/58 (pg.6/277) Logistics Review Zero Error Codes If P (n) e = 0, then H(W |Y n) = 0 (no uncertainty) For simplicity, assume H(W ) = nR = logM (i.e., uniform dist. over {1, 2, . . . ,M}. Sufficient since this is max rate under M messages. First lets consider the case if P (n) e = 0, in such case it is easy to show that R ≤ C. Then we get nR = H(W ) = H(W |Y n) + I(W ;Y n) = I(W ;Y n) (16.21) ≤ I(Xn;Y n) //Since W → Xn → Y n and data proc. ineq. (16.22) = H(Y n)−H(Y n|Xn) = H(Y n)− n∑ i=1 H(Yi|Y1:i−1, Xn) (16.23) But Yi⊥ {Y1:i−1, X1:i−1, Xi+1:n}|Xi, so we can continue as = H(Y n)− n∑ i=1 H(Yi|Xi) ≤ ∑ i [ H(Yi)−H(Yi|Xi) ] (16.24) = n∑ i=1 I(Yi;Xi) ≤ nC (16.25) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F7/58 (pg.7/277) Logistics Review Zero-error capacity What if we insist on R = C and Pe = 0. In such case, what are the requirements of any such code. nR = H(W ) = H(Xn(W )) //if codewords distinct (16.33) = H(Xn|Y n)︸ ︷︷ ︸ =0 since Pe=0 +I(Xn;Y n) = I(Xn;Y n) (16.34) = H(Y n)−H(Y n|Xn) (16.35) = H(Y n)− n∑ i=1 H(Yi|Xi) (16.36) = ∑ i H(Yi)− ∑ i H(Yi|Xi) //if all Yi’s are indep (16.37) = ∑ i I(Xi;Yi) (16.38) = nC //if we choose p∗(x) ∈ argmax p(x) I(X;Y ) (16.39) (16.40) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F10/58 (pg.10/277) Logistics Review Feedback for DMC Definition 16.2.3 ((2nR, n) feedback code) Such a code is the encoder Xi(W,Y1:i−1), a decoder g : Y n → { 1, 2, . . . , 2nR } , and P (n) e = Pr(g(Y n) 6= W ) for H(W ) = nR (uniform). Definition 16.2.4 (Capacity) The capacity with feedback CFB of a DMC is the max of all rates achievable by feedback codes. Theorem 16.2.5 CFB = C = max p(x) I(X;Y ) for a DMC (16.33) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F11/58 (pg.11/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Data compression: We now know that it is possible to achieve error free compression if our average rate of compression, R, measured in units of bits per source symbol, is such that R > H where H is the entropy of the generating source distribution. Data Transmission: We now know that it is possible to achieve error free communication and transmission of information if R < C, where R is the average rate of information sent (units of bits per channel use), and C is the capacity of the channel. Q: Does this mean that if H < C, we can reliably send a source of entropy H over a channel of capacity C? This seems intuitively reasonable. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F12/58 (pg.12/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Data compression: We now know that it is possible to achieve error free compression if our average rate of compression, R, measured in units of bits per source symbol, is such that R > H where H is the entropy of the generating source distribution. Data Transmission: We now know that it is possible to achieve error free communication and transmission of information if R < C, where R is the average rate of information sent (units of bits per channel use), and C is the capacity of the channel. Q: Does this mean that if H < C, we can reliably send a source of entropy H over a channel of capacity C? This seems intuitively reasonable. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F12/58 (pg.15/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem: process The process would go something as follows: 1 Compress a source down to its entropy, using Huffman, LZ, arithmetic coding, etc. 2 Transmit it over a channel. 3 If all sources could share the same channel, would be very useful. 4 I.e., perhaps the same channel coding scheme could be used regardless of the source, if the source is first compressed down to the entropy. The channel encoder/decoder need not know anything about the original source (or how to encode it). 5 Joint source/channel decoding as in the following figure: encoder decoder noise source channel encoder channel source decoder receiversource coder channel decoder 6 Maybe obvious now, but at the time (1940s) it was a revolutionary idea! Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F13/58 (pg.16/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem: process The process would go something as follows: 1 Compress a source down to its entropy, using Huffman, LZ, arithmetic coding, etc. 2 Transmit it over a channel. 3 If all sources could share the same channel, would be very useful. 4 I.e., perhaps the same channel coding scheme could be used regardless of the source, if the source is first compressed down to the entropy. The channel encoder/decoder need not know anything about the original source (or how to encode it). 5 Joint source/channel decoding as in the following figure: encoder decoder noise source channel encoder channel source decoder receiversource coder channel decoder 6 Maybe obvious now, but at the time (1940s) it was a revolutionary idea! Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F13/58 (pg.17/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem: process The process would go something as follows: 1 Compress a source down to its entropy, using Huffman, LZ, arithmetic coding, etc. 2 Transmit it over a channel. 3 If all sources could share the same channel, would be very useful. 4 I.e., perhaps the same channel coding scheme could be used regardless of the source, if the source is first compressed down to the entropy. The channel encoder/decoder need not know anything about the original source (or how to encode it). 5 Joint source/channel decoding as in the following figure: encoder decoder noise source channel encoder channel source decoder receiversource coder channel decoder 6 Maybe obvious now, but at the time (1940s) it was a revolutionary idea! Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F13/58 (pg.20/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem: process The process would go something as follows: 1 Compress a source down to its entropy, using Huffman, LZ, arithmetic coding, etc. 2 Transmit it over a channel. 3 If all sources could share the same channel, would be very useful. 4 I.e., perhaps the same channel coding scheme could be used regardless of the source, if the source is first compressed down to the entropy. The channel encoder/decoder need not know anything about the original source (or how to encode it). 5 Joint source/channel decoding as in the following figure: encoder decoder noise source channel encoder channel source decoder receiversource coder channel decoder 6 Maybe obvious now, but at the time (1940s) it was a revolutionary idea! Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F13/58 (pg.21/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Source: V ∈ V that satisfies AEP (e.g., stationary ergodic). Send V1:n = V1, V2, . . . , Vn over channel, entropy rate H(V) of stochastic process (if i.i.d., H(V) = H(Vi),∀i). V1:n → Encoder → Xn → Channel → Y n → Decoder → V̂1:n Error probability and setup: P (n)e = P (V1:n 6= V̂1:n) (16.1) = ∑ y1:n,v1:n Pr(v1:n)Pr(y1:n|Xn(v1:n))1{g(y1:n) 6= v1:n} (16.2) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F14/58 (pg.22/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Source: V ∈ V that satisfies AEP (e.g., stationary ergodic). Send V1:n = V1, V2, . . . , Vn over channel, entropy rate H(V) of stochastic process (if i.i.d., H(V) = H(Vi),∀i). V1:n → Encoder → Xn → Channel → Y n → Decoder → V̂1:n Error probability and setup: P (n)e = P (V1:n 6= V̂1:n) (16.1) = ∑ y1:n,v1:n Pr(v1:n)Pr(y1:n|Xn(v1:n))1{g(y1:n) 6= v1:n} (16.2) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F14/58 (pg.25/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Theorem 16.3.1 (Source/Channel Coding Theorem) if V1:n satisfies AEP, then ∃ a sequence of (2nR, n) codes with P (n)e → 0 if H(V) < C. Conversely, if H(V) > C, then P (n)e > 0 for all n and cannot send with arbitrarily low probability of error. Proof. If V satisfies AEP, then ∃ a set A(n) with |A(n) | ≤ 2n(H(V)+) (A(n) has all the probability). We only encode the typical set, and signal an error otherwise. This contributes  to Pe. We index elements of A (n)  as { 1, 2, . . . , 2n(H+) } , so need n(H + ) bits. This gives a rate of R = H(V) + . If R < C then the error <  which we can make as small as we wish. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F15/58 (pg.26/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Theorem 16.3.1 (Source/Channel Coding Theorem) if V1:n satisfies AEP, then ∃ a sequence of (2nR, n) codes with P (n)e → 0 if H(V) < C. Conversely, if H(V) > C, then P (n)e > 0 for all n and cannot send with arbitrarily low probability of error. Proof. If V satisfies AEP, then ∃ a set A(n) with |A(n) | ≤ 2n(H(V)+) (A(n) has all the probability). We only encode the typical set, and signal an error otherwise. This contributes  to Pe. We index elements of A (n)  as { 1, 2, . . . , 2n(H+) } , so need n(H + ) bits. This gives a rate of R = H(V) + . If R < C then the error <  which we can make as small as we wish. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F15/58 (pg.27/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Theorem 16.3.1 (Source/Channel Coding Theorem) if V1:n satisfies AEP, then ∃ a sequence of (2nR, n) codes with P (n)e → 0 if H(V) < C. Conversely, if H(V) > C, then P (n)e > 0 for all n and cannot send with arbitrarily low probability of error. Proof. If V satisfies AEP, then ∃ a set A(n) with |A(n) | ≤ 2n(H(V)+) (A(n) has all the probability). We only encode the typical set, and signal an error otherwise. This contributes  to Pe. We index elements of A (n)  as { 1, 2, . . . , 2n(H+) } , so need n(H + ) bits. This gives a rate of R = H(V) + . If R < C then the error <  which we can make as small as we wish. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F15/58 (pg.30/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Theorem 16.3.1 (Source/Channel Coding Theorem) if V1:n satisfies AEP, then ∃ a sequence of (2nR, n) codes with P (n)e → 0 if H(V) < C. Conversely, if H(V) > C, then P (n)e > 0 for all n and cannot send with arbitrarily low probability of error. Proof. If V satisfies AEP, then ∃ a set A(n) with |A(n) | ≤ 2n(H(V)+) (A(n) has all the probability). We only encode the typical set, and signal an error otherwise. This contributes  to Pe. We index elements of A (n)  as { 1, 2, . . . , 2n(H+) } , so need n(H + ) bits. This gives a rate of R = H(V) + . If R < C then the error <  which we can make as small as we wish. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F15/58 (pg.31/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem Theorem 16.3.1 (Source/Channel Coding Theorem) if V1:n satisfies AEP, then ∃ a sequence of (2nR, n) codes with P (n)e → 0 if H(V) < C. Conversely, if H(V) > C, then P (n)e > 0 for all n and cannot send with arbitrarily low probability of error. Proof. If V satisfies AEP, then ∃ a set A(n) with |A(n) | ≤ 2n(H(V)+) (A(n) has all the probability). We only encode the typical set, and signal an error otherwise. This contributes  to Pe. We index elements of A (n)  as { 1, 2, . . . , 2n(H+) } , so need n(H + ) bits. This gives a rate of R = H(V) + . If R < C then the error <  which we can make as small as we wish. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F15/58 (pg.32/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. Then P (n)e = Pr(V1:n 6= V̂1:n) (16.3) ≤ Pr(V1:n /∈ A(n) ) + Pr(g(Y n) 6= V n|V n ∈ A(n) )︸ ︷︷ ︸ < since R<C (16.4) ≤ +  = 2 (16.5) And the first part of the theorem is proved. To show the converse, show that P (n) e → 0⇒ H(V) ≤ C for source channel codes. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F16/58 (pg.35/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. Then P (n)e = Pr(V1:n 6= V̂1:n) (16.3) ≤ Pr(V1:n /∈ A(n) ) + Pr(g(Y n) 6= V n|V n ∈ A(n) )︸ ︷︷ ︸ < since R<C (16.4) ≤ +  = 2 (16.5) And the first part of the theorem is proved. To show the converse, show that P (n) e → 0⇒ H(V) ≤ C for source channel codes. . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F16/58 (pg.36/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. Define: Xn(V n) : Vn → X n //encoder (16.6) gn(Y n) : Yn → Vn //decoder (16.7) Now recall, original Fano says H(X|Y ) ≤ 1 + Pe log |X |. Here we have H(V n|V̂ n) ≤ 1 + P (n)e log |Vn| = 1 + nPe log |X | (16.8) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F17/58 (pg.37/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. Define: Xn(V n) : Vn → X n //encoder (16.6) gn(Y n) : Yn → Vn //decoder (16.7) Now recall, original Fano says H(X|Y ) ≤ 1 + Pe log |X |. Here we have H(V n|V̂ n) ≤ 1 + P (n)e log |Vn| = 1 + nPe log |X | (16.8) . . . Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F17/58 (pg.40/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.41/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.42/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.45/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.46/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.47/277) Joint Thm Coding Hamming Codes Differential Entropy Joint Source/Channel Theorem . . . proof continued. We get the following derivation H(V) ≤ H(V1, V2, . . . , Vn) n = H(V1:n) n (16.9) = 1 n H(V1:n|V̂1:n) + 1 n I(V1:n; V̂1:n) (16.10) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(V1:n; V̂1:n) //by Fano (16.11) ≤ 1 n ( 1 + P (n)e n log |V| ) + 1 n I(X1:n;Y1:n) //V → X → Y → V̂ and DPP (16.12) ≤ 1 n + P (n)e log |V|+ C //memoryless (16.13) Letting n→∞, 1/n and Pe → 0 which leaves us with H(V) ≤ C. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F18/58 (pg.50/277) Joint Thm Coding Hamming Codes Differential Entropy Coding and Codes Shannon’s theorem says that there exists a sequence of codes such that if R < C the error goes to zero. It doesn’t provide such a code, nor does it offer much insight on how to find one. Typical set coding is not practical. Why? Exponentially large sized blocks. In all cases, we add enough redundancy to a message so that the original message can be decoded unambiguously Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F19/58 (pg.51/277) Joint Thm Coding Hamming Codes Differential Entropy Coding and Codes Shannon’s theorem says that there exists a sequence of codes such that if R < C the error goes to zero. It doesn’t provide such a code, nor does it offer much insight on how to find one. Typical set coding is not practical. Why? Exponentially large sized blocks. In all cases, we add enough redundancy to a message so that the original message can be decoded unambiguously Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F19/58 (pg.52/277) Joint Thm Coding Hamming Codes Differential Entropy Coding and Codes Shannon’s theorem says that there exists a sequence of codes such that if R < C the error goes to zero. It doesn’t provide such a code, nor does it offer much insight on how to find one. Typical set coding is not practical. Why? Exponentially large sized blocks. In all cases, we add enough redundancy to a message so that the original message can be decoded unambiguously Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F19/58 (pg.55/277) Joint Thm Coding Hamming Codes Differential Entropy Coding and Codes Shannon’s theorem says that there exists a sequence of codes such that if R < C the error goes to zero. It doesn’t provide such a code, nor does it offer much insight on how to find one. Typical set coding is not practical. Why? Exponentially large sized blocks. In all cases, we add enough redundancy to a message so that the original message can be decoded unambiguously Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F19/58 (pg.56/277) Joint Thm Coding Hamming Codes Differential Entropy Physical Solution to Improve Coding It is possible to communicate more reliably by changing physical properties to decrease the noise (e.g., decrease p in a BSC). Use more reliable and expensive circuitry improve environment (e.g., control thermal conditions, remove dust particles or even air molecules) In compression, use more physical area/volume for each bit. In communication, use higher power transmitter, use more energy thereby making noise less of a problem. These are not IT solutions which is what we want. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F20/58 (pg.57/277) Joint Thm Coding Hamming Codes Differential Entropy Physical Solution to Improve Coding It is possible to communicate more reliably by changing physical properties to decrease the noise (e.g., decrease p in a BSC). Use more reliable and expensive circuitry improve environment (e.g., control thermal conditions, remove dust particles or even air molecules) In compression, use more physical area/volume for each bit. In communication, use higher power transmitter, use more energy thereby making noise less of a problem. These are not IT solutions which is what we want. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F20/58 (pg.60/277) Joint Thm Coding Hamming Codes Differential Entropy Physical Solution to Improve Coding It is possible to communicate more reliably by changing physical properties to decrease the noise (e.g., decrease p in a BSC). Use more reliable and expensive circuitry improve environment (e.g., control thermal conditions, remove dust particles or even air molecules) In compression, use more physical area/volume for each bit. In communication, use higher power transmitter, use more energy thereby making noise less of a problem. These are not IT solutions which is what we want. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F20/58 (pg.61/277) Joint Thm Coding Hamming Codes Differential Entropy Physical Solution to Improve Coding It is possible to communicate more reliably by changing physical properties to decrease the noise (e.g., decrease p in a BSC). Use more reliable and expensive circuitry improve environment (e.g., control thermal conditions, remove dust particles or even air molecules) In compression, use more physical area/volume for each bit. In communication, use higher power transmitter, use more energy thereby making noise less of a problem. These are not IT solutions which is what we want. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F20/58 (pg.62/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Repetition Repetition Code Code Code Rather than send message x1x2 . . . xk we repeat each symbol K times redundantly. Recall our example of repeating each word in a noisy analog radio connection. Message becomes x1x1 . . . x1︸ ︷︷ ︸ k× x2x2 . . . x2︸ ︷︷ ︸ k× . . . For many channels (e.g., BSC(p < 1/2)), error goes to zero as k →∞. Easy decoding: when k is odd, take a majority vote (which is optimal for a BSC) On the other hand, R ∝ 1/k → 0 as k →∞ This is really a pre-1948 way of thinking code. Thus, this is not a good code. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F21/58 (pg.65/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Repetition Repetition Code Code Code Rather than send message x1x2 . . . xk we repeat each symbol K times redundantly. Recall our example of repeating each word in a noisy analog radio connection. Message becomes x1x1 . . . x1︸ ︷︷ ︸ k× x2x2 . . . x2︸ ︷︷ ︸ k× . . . For many channels (e.g., BSC(p < 1/2)), error goes to zero as k →∞. Easy decoding: when k is odd, take a majority vote (which is optimal for a BSC) On the other hand, R ∝ 1/k → 0 as k →∞ This is really a pre-1948 way of thinking code. Thus, this is not a good code. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F21/58 (pg.66/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Repetition Repetition Code Code Code Rather than send message x1x2 . . . xk we repeat each symbol K times redundantly. Recall our example of repeating each word in a noisy analog radio connection. Message becomes x1x1 . . . x1︸ ︷︷ ︸ k× x2x2 . . . x2︸ ︷︷ ︸ k× . . . For many channels (e.g., BSC(p < 1/2)), error goes to zero as k →∞. Easy decoding: when k is odd, take a majority vote (which is optimal for a BSC) On the other hand, R ∝ 1/k → 0 as k →∞ This is really a pre-1948 way of thinking code. Thus, this is not a good code. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F21/58 (pg.67/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Repetition Repetition Code Code Code Rather than send message x1x2 . . . xk we repeat each symbol K times redundantly. Recall our example of repeating each word in a noisy analog radio connection. Message becomes x1x1 . . . x1︸ ︷︷ ︸ k× x2x2 . . . x2︸ ︷︷ ︸ k× . . . For many channels (e.g., BSC(p < 1/2)), error goes to zero as k →∞. Easy decoding: when k is odd, take a majority vote (which is optimal for a BSC) On the other hand, R ∝ 1/k → 0 as k →∞ This is really a pre-1948 way of thinking code. Thus, this is not a good code. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F21/58 (pg.70/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Code Example (From D. Mackay) Consider sending message s = 0 0 1 0 1 1 0 One scenario s 0 0 1 0 1 1 0 t 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 n 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 r 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 0 0 Another scenario s 0 0 1 0 1 1 0 t 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 n 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 r 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 0 0 ŝ 0 0 1 0 0 1 0 corrected errors detected but uncorrected errors Thus, can only correct one bit error not two. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F22/58 (pg.71/277) Joint Thm Coding Hamming Codes Differential Entropy Repetition Code Example (From D. Mackay) Consider sending message s = 0 0 1 0 1 1 0 One scenario s 0 0 1 0 1 1 0 t 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 n 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 r 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 0 0 Another scenario s 0 0 1 0 1 1 0 t 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 n 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 r 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 0 0 ŝ 0 0 1 0 0 1 0 corrected errors detected but uncorrected errors Thus, can only correct one bit error not two. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F22/58 (pg.72/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.75/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.76/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.77/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.80/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.81/277) Joint Thm Coding Hamming Codes Differential Entropy Simple Parity Check Code Binary input/output alphabets X = Y = {0, 1}. Block sizes of n− 1 bits: x1:n−1. nth bit is an indicator of an odd number of 1 bits in x1:n−1. I.e., xn ← mod (∑n−1 i=1 xi, 2 ) . Thus a necessary condition for valid code word is: mod (∑n i=1 xi, 2 ) = 0. Any instance of an odd number of errors (bit swaps) won’t pass this condition, and such an error is hence detected. although an even number of errors will pass the condition (error goes undetected). can not correct all errors, and moreover only detects some of the kinds of errors (odd number of swaps). On the other hand, parity checks form the basis for many sophisticated coding schemes (e.g., low-density parity check (LDPC) codes, Hamming codes etc.). We study Hamming codes next. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F23/58 (pg.82/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Best illustrated by an example. Let X = Y = {0, 1}. Fix the desired rate at R = 4/7 bit per channel use. Thus, in order to send 4 data bits, we need to use the channel 7 times. Let the four data bits be denoted x0, x1, x2, x3 ∈ {0, 1}. When we send these 4 bits, we are also going to send 3 additional parity or redundancy bits, named x4, x5, x6. Note: all arithmetic in the following will be mod 2. I.e. 1 + 1 = 0, 1 + 0 = 1, 1 = 0− 1 = −1, etc. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F24/58 (pg.85/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Best illustrated by an example. Let X = Y = {0, 1}. Fix the desired rate at R = 4/7 bit per channel use. Thus, in order to send 4 data bits, we need to use the channel 7 times. Let the four data bits be denoted x0, x1, x2, x3 ∈ {0, 1}. When we send these 4 bits, we are also going to send 3 additional parity or redundancy bits, named x4, x5, x6. Note: all arithmetic in the following will be mod 2. I.e. 1 + 1 = 0, 1 + 0 = 1, 1 = 0− 1 = −1, etc. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F24/58 (pg.86/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Best illustrated by an example. Let X = Y = {0, 1}. Fix the desired rate at R = 4/7 bit per channel use. Thus, in order to send 4 data bits, we need to use the channel 7 times. Let the four data bits be denoted x0, x1, x2, x3 ∈ {0, 1}. When we send these 4 bits, we are also going to send 3 additional parity or redundancy bits, named x4, x5, x6. Note: all arithmetic in the following will be mod 2. I.e. 1 + 1 = 0, 1 + 0 = 1, 1 = 0− 1 = −1, etc. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F24/58 (pg.87/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Best illustrated by an example. Let X = Y = {0, 1}. Fix the desired rate at R = 4/7 bit per channel use. Thus, in order to send 4 data bits, we need to use the channel 7 times. Let the four data bits be denoted x0, x1, x2, x3 ∈ {0, 1}. When we send these 4 bits, we are also going to send 3 additional parity or redundancy bits, named x4, x5, x6. Note: all arithmetic in the following will be mod 2. I.e. 1 + 1 = 0, 1 + 0 = 1, 1 = 0− 1 = −1, etc. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F24/58 (pg.90/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Best illustrated by an example. Let X = Y = {0, 1}. Fix the desired rate at R = 4/7 bit per channel use. Thus, in order to send 4 data bits, we need to use the channel 7 times. Let the four data bits be denoted x0, x1, x2, x3 ∈ {0, 1}. When we send these 4 bits, we are also going to send 3 additional parity or redundancy bits, named x4, x5, x6. Note: all arithmetic in the following will be mod 2. I.e. 1 + 1 = 0, 1 + 0 = 1, 1 = 0− 1 = −1, etc. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F24/58 (pg.91/277) Joint Thm Coding Hamming Codes Differential Entropy (7, 4, 3) Hamming Codes Parity bits determined by the following equations: x4 ≡ x1 + x2 + x3 mod 2 (16.14) x5 ≡ x0 + x2 + x3 mod 2 (16.15) x6 ≡ x0 + x1 + x3 mod 2 (16.16) I.e., if (x0, x1, x2, x3) = (0110) then (x4, x5, x6) = (011) and complete 7-bit codeword sent over channel would be (0110011). We can also describe this using linear equalities as follows (all mod 2). x1 + x2 + x3 + x4 = 0 x0 + x2 + x3 +x5 = 0 x0 + x1 + x3 + x6 = 0 (16.17) Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F25/58 (pg.92/277) Joint Thm Coding Hamming Codes Differential Entropy Hamming Codes Or alternatively, as Hx = 0 where xᵀ = (x1, x2, . . . , x7) and H = 0 1 1 1 1 0 01 0 1 1 0 1 0 1 1 0 1 0 0 1  (16.18) Codewords lie in null-space of H Notice that H is a column permutation of all seven non-zero length-3 column vectors. Thus the code words are defined by the null-space of H. I.e., {x : Hx = 0}. Since the rank of H is 3, the null-space is 4, and we expect there to be 16 = 24 binary vectors in this null space. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F26/58 (pg.95/277) Joint Thm Coding Hamming Codes Differential Entropy Hamming Codes Or alternatively, as Hx = 0 where xᵀ = (x1, x2, . . . , x7) and H = 0 1 1 1 1 0 01 0 1 1 0 1 0 1 1 0 1 0 0 1  (16.18) Codewords lie in null-space of H Notice that H is a column permutation of all seven non-zero length-3 column vectors. Thus the code words are defined by the null-space of H. I.e., {x : Hx = 0}. Since the rank of H is 3, the null-space is 4, and we expect there to be 16 = 24 binary vectors in this null space. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F26/58 (pg.96/277) Joint Thm Coding Hamming Codes Differential Entropy Hamming Codes Or alternatively, as Hx = 0 where xᵀ = (x1, x2, . . . , x7) and H = 0 1 1 1 1 0 01 0 1 1 0 1 0 1 1 0 1 0 0 1  (16.18) Codewords lie in null-space of H Notice that H is a column permutation of all seven non-zero length-3 column vectors. Thus the code words are defined by the null-space of H. I.e., {x : Hx = 0}. Since the rank of H is 3, the null-space is 4, and we expect there to be 16 = 24 binary vectors in this null space. Prof. Jeff Bilmes EE514a/Fall 2013/Information Theory I – Lecture 16 - Nov 21st, 2013 L16 F26/58 (pg.97/277)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved