Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Tutorial Review: Ergodic Theory & Source Coding for Information Compression with Memory - , Study notes of Electrical and Electronics Engineering

A tutorial review on the use of ergodic theory in the analysis of information sources and source coding. The asymptotic equipartition property (aep) for ergodic finite alphabet sources, the ergodic decomposition of a stationary source, and existence theorems for noiseless variable length codes. It also discusses attempts to obtain general results for non-stationary sources.

Typology: Study notes

2009/2010

Uploaded on 02/24/2010

koofers-user-i3m
koofers-user-i3m 🇺🇸

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Tutorial Review: Ergodic Theory & Source Coding for Information Compression with Memory - and more Study notes Electrical and Electronics Engineering in PDF only on Docsity! Compression of Information Sources with Memory: A Tutorial Review By Sundeep Venkatraman Department of Electrical Engineering University of Notre Dame Introduction: In the early days of the development of information theory, sources were often conveniently modeled as memoryless processes and the analysis was based on the law of large numbers. However, it was soon apparent that such a model was not realistic enough and hence, a more general model of the information source as a stationary random process was adopted, where the ergodic theorem was used instead of the law of large numbers. The use of the ergodic theorem to prove the entropy rate theorem, better known as the Asymptotic Equipartition Property was one of the key results obtained from this new point of view ([1]). The importance of ergodic theory however, extends beyond the use of the ergodic theorem for proving the AEP. Certain results from ergodic theory can be applied to the theory of Universal source coding as well ([2],[3]). Some of these results are presented in the sections to come. While the ergodic theorem is a powerful tool in the attempt to extend the basic results of information theory pertaining to source coding, there have also other attempts made to obtain general results for the case of nonstationary sources ([4],[5]). Some of the pertinent results from these references are also presented in later sections. Application of the ergodic theorem to the analysis of source processes. The following is the definition of the property of Ergodicity [6] Let )()( ' mLxf µ∈ , where m is a measurable space and let µ be the finite measure over which f is integrable. If T refers to any measurable transformation T: mm → Then T is said to be ergodic if the following relation holds true with probability 1. ∫Σ == − =∞→ m k n kn xdxfxfxTf n )()()(ˆ)(1lim 1 0 µ almost everywhere (1) (Note: Often, kT is a left shift by k.) In the theory of probability, similar assertions are referred to as the law of large numbers and since the convergence takes place almost everywhere, the ergodic theorem is actually a stronger result than the law of large numbers. One of the important results obtained by the use of the ergodic theorem in place of the law of large numbers is in [1], where a proof is given for the Asymptotic Equipartition Property (AEP) for ergodic finite alphabet sources in the convergence in probability sense. The statement of the property as given in this paper is “Every ergodic source has the AEP”. According to the proof, nn 1lim− ∞→ log(µ[0,n-1;X]) → H(X) (in probability) Where H(X) is the entropy rate of the source (since µ[0,n-1;X] represents the vector of random variables in X ). In the special case where X1, X2, ... Xn are i.i.d random variables, the entropy rate equals the entropy and hence, we get the well known standard result for the i.i.d case. encoded via a one-one mapping into a random codeword Yi. The entire scheme is adaptive because the mapping from Xi to Yi depends on the previously observed values of X0, X1,…Xi-1. The rate rX(τ) at which the scheme NC∈τ codes the given source X is defined by rX(τ) = ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛Σ − = − ∞→ N YEL n i n in )( suplim 1 0 1 The optimum rate is then given by }:)(inf{inf NXN Cr ∈ττ The stationary hull Λ(X) of the source is the class of stationary processes Z = (Z0, Z1, …) With alphabet A such that for some sequence of positive numbers n0< n1< …, )]([,...)),((1lim 1 1 0 ZfEXXfE n ii n ijj j =+ − = ∞→ Σ Every process in the stationary hull is necessarily stationary (hence the name). The result proved in [4] for the optimal rate is )}(:)(sup{ XZZHrX Λ∈= , where H(Z) is the entropy rate of Z This result can be justified considering the fact that the stationary hull is a kind of equivalent representation of the non-stationary process in terms of a class of stationary processes. In case X is itself stationary, Λ(X) consists of X alone and the result is the same as the familiar result for stationary processes. Another approach to the general case of non-stationary sources has been explored in [5]. The following definitions are pertinent. Def 1. A channel W with input alphabet A and output alphabet B is a sequence of conditional distributions given by W = ∞=×∈= 1| }),();|()|({ n nnnnnn XY nnn BAyxxyPxyW nn Def 2.Given a joint distribution ),( nnYX yxP nn , the information density is the function defined on nn BA × : )( )|(log),( n Y nnn nn WX bP abWbai n nn = The distribution of ),()/1( nnWX YXin nn is referred to as the information spectrum. The expected value of the information spectrum is the normalized mutual information ( );(/1 nn YXnI ). Def 3. The sup (resp. inf) information rate is defined as limsup (resp. liminf) in probability of the sequence of random variables { ),()/1( nnWX yxin nn } denoted by );( _ YXI (I (X;Y)) ( Hereafter, X and Y denote the vector of these random variables) Also I(X;Y) = );(/1lim nn n YXnI ∞→ Therefore for a channel with a finite alphabet, if );( _ YXI = I (X;Y) We have I(X;Y) = I (X;Y) = );( _ YXI Def 4. For any positive integer M, a probability distribution P is said to be M-type if P(ω) }1....,/2,/1,0{ MM∈ Def 5. The resolution R(P) of a probability distribution is the minimum log(M) such that P is M- type. Note: )()( PRPH ≤ Def 6. Let 0≥ε . R is an ε-achievable resolution rate for channel W if for every input process X and for all γ>0 there exists an ~ X whose resolution satisfies γ+< RXR n n )~(1 and ε<)~,( nn YYd The minimum ε-achievable resolution rate (resp. resolution rate) is called ε-resolvability (resp. resolvability) of the channel and is denoted by Sε (resp. S). If the resolution rates are defined for a particular input process X, then we refer to them as Sε(X) (resp. S(X)). If we replaced resolution by entropy in the above definition, we get mean resolvability )(S _ Xε (resp )(S _ X ) Def 7. R is an ε-achievable (fixed length) source coding rate for X if for all γ>0 and sufficiently large n, there exists a collection of M n-tuples { nMn xx ,...,1 } such that γ+< RMn )log(/1 and P[Xn },...,{ 1 nMn xx∉ ] ε≤ R is an achievable code if it is ε-achievable for all 0<ε<1. T(X) denotes the minimum achievable source coding rate for X. Def 8. Fix an integer 2≥r . R is an achievable variable length source coding rate for X if for all γ>0 and all sufficiently large n, there exists an r-ary prefix code such that the average codeword length Ln satisfies γ+< RrLn n log)/1( The minimum achievable variable length source coding rate for X is denoted by )( _ XT The following theorems of relevance to source coding have been proved in [5]. 1. For any X and the Identity channel, )()( XTXS = .The approach here is to consider a channel which essentially transmits the inputs unchanged (hence the identity channel) and thereby obtain the source coding result involving resolvability (a quantity originally defined for a channel W). Both this and subsequent theorems are proved by the method of proving both )()( XTXS ≥ and )()( XTXS ≤ 2. For any X and the Identity channel, )()( __ XTXS = 3. For any X and the identity channel, );()()( _ XXIXTXS == .This result relates the resolution to the source coding rate to the sup entropy rate, thereby tying together two different information theoretic quantities and relating them to variable length source coding rate. 4. For every channel W and input process X , );()( _ YXIXS ≤ε . This theorem relates the ε- resolvability (which is related to the source coding rate) to the sup information rate, analogous to the joint source channel coding theorem. 5. For any channel with finite input alphabet, );(sup)( _ YXIXS X ≥ε for all ε>0. 6. By the two previous theorems, we see that );(sup)( _ YXIXS X =ε for all finite alphabet sources. 7. );( YXISS ≤≤ε This result is identical to Theorem 4 except that resolvability has been replaced by mean resolvability. For all of the above proofs, the criterion used was that of the vanishing variational distance. However, replacing this distance metric with the differential entropy does not make a significant difference as neither one is stronger than the other. In most general cases, bounds were obtained for the source coding rate in terms of resolvability. Only in the case of finite input alphabet, do we get equality of ε-resolvability and sup information rate. This is because, only in the finite alphabet case is it possible to obtain a lower bound on the distance between two outputs, given any two arbitrary inputs which are not identical (ref section IV lemma 6 in [5]). All of the above results viewed together constitute a concise and consistent set of results for source coding rates in terms of resolvability and related quantities which were previously not very widely used in information theory. References: [1] B. McMillan, “The Basic Theorems of Information Theory,” Ann. Math. Stat vol. 24, pp 196- 219, 1953 [2] P.C. Shields, “The Interactions Between Ergodic Theory and Information Theory,” IEEE Trans. Inform Theory. Vol. 44, pp 2079- 2093, 1998 [3] R.M. Gray and L.D. Davisson, “The Ergodic Decomposition of Discrete Random Processes,” IEEE Trans. Inform Theory. Vol. IT-20, pp 625- 636, 1974 [4] J.C. Kieffer, “Finite-State Adaptive Block to Variable Length Noiseless Coding of a Nonstationary Information Source,” IEEE Trans. Inform Theory. Vol. 35, pp 1259- 1263, 1989 [5] T.S. Han and S. Verdu, “Approximation Theory of Output Statistics,” IEEE Trans. Inform Theory. Vol. 39, pp 752- 772, 1993 [6] Y.G. Sinai, “Introduction to Ergodic Theory,” Princeton University Press, 1976
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved