Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bridging Gap: Info Theory & Statistical Physics - Optimization Problems & Thermodynamics -, Study notes of Electrical and Electronics Engineering

The connections between information theory and statistical physics, focusing on the identification of optimization problems in information-theoretic settings and their analogy to parallel structures in statistical physics. It also discusses the application of the maximum entropy principle in information theory and its relationship to statistical mechanics. The document further delves into landauer's erasure principle and its implications for information theory and thermodynamics.

Typology: Study notes

2009/2010

Uploaded on 02/25/2010

koofers-user-u15
koofers-user-u15 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Bridging Gap: Info Theory & Statistical Physics - Optimization Problems & Thermodynamics - and more Study notes Electrical and Electronics Engineering in PDF only on Docsity! Information Theory and Statistical Physics Rahul Singh December 9, 2009 1 Introduction Relationships between information theory and statistical physics have been rec- ognized over the last few decades. One such aspect is identifying structures of optimization problems pertaining to certain information-theoretic settings and drawing analogy to parallel structures arising in statistical physics, and then borrowing statistical mechanical insights, as well as powerful analysis tech- niques (like the replica method) from statistical physics to the dual information- theoretic setting of interest. Another aspect is the application of the maximum entropy principle, which emerged in statistical mechanics and treating it as a general guiding principle to problems in information theory e.g. signal processing,speech coding ,spec- trum estimation. In the reverse side, we can consider statistical mechanics as a form of statistical reference. Information theory gives us a constructive cri- terion of setting up probability distributions on the basis of partial knowledge, leading to maximum-entropy estimate. So, the usual rules in statistical physics are an immediate consequence of maximum-entropy principle. The facts about maximization of entropy were stated by Gibbs much early, but this property was were treated as side remarks, not providing any justification for the meth- ods of statistical mechanics [2]. This missing ”feature” has been supplied by information theory. So, now entropy can be taken as a starting point of concept, and the fact that a probability distribution maximizes the entropy subject to certain constraints becomes an essential fact which justifies use of that distri- bution for inferences. Landauers erasure principle [6] provides a powerful link between information theory and physics. Information processing or even the storage of information leads to entropy.As per the principle,the erasure of every bit of information increases the thermodynamic entropy of the world by k ln 2, where k is Boltzmann’s constant(1.38 x 10−23 J/K), and so, this suggests very strong link between the two areas. The report will look at some of the parallels between these two areas. First it discusses results of [4] where he gives an analogy of the information inequality, and Data Processing Theorem(DPT) with the second law of thermodynamics. DPT is used in many proofs of converses of theorems in information theory, hence the roots of fundamental limits of Information Theory can be attributed to the laws of physics. Then it discusses a result [3] about Chernoff bounds and 1 relates it to statistical physics and its applies in Information Theory. 2 Physics of Shannon Limits This section provides an interpretation of log-sum inequality(information in- equality) and data processing theorem in terms of second law of thermodynam- ics. The non-negativity of the relative entropy is related to the non-negativity of entropy change of isolated physical system. First some terms of physics which are important to understand this are discussed. Let there be n number of particles in the system. There is an energy func- tion(Hamiltonian) ε(x) associated with every microstate x, where x is a vector x = (x1, . . . , xn) where each xi can be a vector, containing all relevant physi- cal state variables , e.g. positions, momentum, etc. The probability of finding the system at state x is given by P (x) = e −βE(x) Z(β) where β = 1/kT , k is the Boltzmann constant, and T the temperature. Z(β) = ∑ x e−βE(x) (1) or Z(β) = ∫ e−βE(x)dx (2) depending on x being continuous or discrete.Z(β) is also called the partition function. Partition function, other than acting as a normalization constant,is very useful for deriving many macroscopic physical quantities like average in- ternal energy, entropy or Free energy. It encodes the statistical properties of a system in thermodynamic equilibrium. It is a function of temperature and other parameters, such as the volume enclosing a gas. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. Free energy(as function of temperature) is given by F (β) = − lnZ(β) β (3) Free energy represents the amount of work that can be extracted from a system. Helmholtz free energy is defined as A = U − TS (4) where U is the internal energy, T is the absolute temperature, and S is the entropy. Gibbs free energy is given by G = H − TS (5) 2 be a matrix of conditional probabilities from V to U . For a given f and a constant E the focus will be on the probability of event of type (Large devia- tion analysis) ∑n i=1 f(Ui, vi) ≤ nE . Next, let us define for each v ∈ V , the (weighted) partition function: Zv(β) = ∑ u∈U q(u|v)e−βf(u,v), β > 0 (15) and for a given Ev in the range minu∈Uf(u, v) ≤ Ev ≤ ∑ u∈U q(u|v)f(u, v) (16) let Sv(Ev) = minβ≥0[βEv + lnZv(β)] (17) for a given constant E in the range∑ v∈V p(v)minu∈Uf(u, v) ≤ E ≤ ∑ v∈V ∑ u∈U p(v)q(u|v)f(u, v) (18) Let S̄(E) = minβ ≥ 0[βE + ∑ v∈V p(v) lnZv(β)] (19) Let H(E) denote the set of all |V | dimensional vectors E = Ev, v ∈ V , where each component Ev satisfies (14), and where ∑ v∈V p(v)Ev ≤ E. The identity is max Ē ∈ H(E) ∑ v∈V p(v)Sv(Ev) = S̄(E) (20) Note that S̄(E) is easier to work as it has just one parameter, while left side expression involves optimizing over β as well as over Ē. See ([3]) for a detailed proof of the identity. 3.2 Application in Information Theory Referring back to the section on problem formulation, we can see that the ex- pressions of rate-distortion function, channel capacity can be derived by correct choice of variables u,v and function f(u, v) according to the problem of interest. Letting U be the reproduction alphabet , V , the source alphabet and f(u, v) be the distortion function d(x, x̂) we see that the problem considers probability of events of type ∑n i=1 f(Ui, vi) ≤ nE. Again letting U be input X and V the output Y of a discrete memoryless channel(DMC) , f(u, v) be −logW (y|x) and E = H(Y |X) using typicality decoding and using random coding, we again notice same thing. These are considered in detail one-by-one. 5 3.2.1 Rate Distortion Consider a discrete memoryless source (DMS) with P = p(x), x ∈ X the letter probabilities associated with it, and for a given reproduction alphabet X̂ , let d : XX̂ → <+ denote a single letter distortion measure. Let R(D) denote the ratedistortion function of the DMS P . Consider random coding. Let (X̂1, . . . , X̂n) be drawn i.i.d. according to optimum random coding distribution q∗(x̂1, . . . , x̂1) = Π n i=1q ∗(x̂i). The event ∑n i=1 d(xi, X̂i) ≤ nD, where xn is a given source vector, typical to P , i.e., the composition of xn consists of nx = np(x) occurrences of each x ∈ X . This event is similar to ∑n i=1 f(Ui, vi) ≤ nE where Ui = X̂i, vi = xi, q(u|v) = q(x̂|x) = q∗(x̂), f(u, v) = d(x, x̂), E = D, Ex(x̂) = 0d(x, x̂). Now, if the probability of this event is of the exponential order of e−nI(D), then it takes about M = en(I(D)+) ( > 0) independent trials to succeed at least once in getting some realization of X̂n within distance nD from xn . This achievability argument leads to I(D) = R(D). Thus, the large deviations rate function of interest matches exactly with the ratedistortion function ([8]), which is R(D) = −min β≥0 [β0D + ∑ x∈X p(x)ln( ∑ x̂∈X̂ q∗(x̂)e−β0d(x,x̂))] (21) The argument could be applied to any i.i.d. coding distribution, not necessarily the optimal one. 3.2.2 Channel Capacity Similar arguments can be given for channel capacity. Let a discrete memoryless channel(DMC) with finite input alphabet X, and finite output alphabet Y be given. Channel transition probabilities are denoted by W (y|x), x ∈ X, y ∈ Y . Random coding with typical decoding is considered. First optimal coding dis- tribution is found, denoted by q∗(x), x ∈ X. The output sequence is denoted by yn. For typical decoding, the decoder decides on the basis of ’likelihood measure’ computed as ∑n i=1 log 1 W (yi|Xi) . H(Y |X) = − ∑ x∈X ∑ y∈Y q(x)W (y|x)log(W (y|x)) (22) If more than one codewords or not even a single codeword meets the criteria, then we have error. So, once again the event of interest is n∑ i=1 log 1 W (yi|Xi) ≤ nH(Y |X) (23) Let I be the rate function of the large-deviation event in (19). So, if the total codewords are less than enI in order exponential, then taking union bound over all codewords, the error probability goes to 0 as n → ∞. Hence I = C and achievability is shown, where C is the channel capacity computed by function. 6 Using the result of identity stated at beginning of section, capacity can be re- written as: C = −min β≥0 [β0H(Y |X) + ∑ y∈Y P (y) ln( ∑ x∈X q∗(x)e−β0[− logW (y|x)])] (24) Note that 0 is used here so that there is no inconsistency with the units. 3.3 Physical Interpretation Consider a physical system consisting of |V | subsystems of particles. The total number of particles in the system is n and the total amount of energy is nE. For each v ∈ V , the subsystem indexed by v contains nv = np(v) particles, each of which can lie in any microstate within a finite set of microstates U , and it is characterized by an additive Hamiltonian ∑ i:vi=v εv(ui), whereεv(u) = f(u, v). The total amount of energy possessed by subsystem v is given by nvEv . As long as the subsystems are in thermal isolation from each other, each one of them may have its own temperature Tv = 1/(kv), where v is the achiever of the normalized (perparticle) entropy associated with an average perparticle energy Ev , i.e., Sv(Ev) = minβ≥0[βEv + lnZv(β)]. So , the rate function I(E) of the event ∑n i=1 f(Ui, vi) ≤ nE is given by maximum negative total perparticle entropy, ∑ v pvSv(Ev), where the maximum is over all energy allocations Ev such that the total energy is conserved. It can be found out using the r.h.s. of the theorem. And the temperature thus obtained by optimizing r.h.s. is the equilibrium temperature of the whole system. References [1] R. S. Ellis. The theory of large deviations and applications to statistical mechanics. lectures for international seminar on extreme events in complex dynamics, Oct 2006. [2] E.T. Jaynes. Information theory and stastical mechanics. Physical Review, 106(4). [3] N.Merhav. An identity of chernoff bounds with an interpretation in statis- tical physics and applications in information theory. IEEE Trans. Inform. Theory, 54:3710–3721, August 2008. [4] N.Merhav. Physics of the shannon limits. IEEE Trans. Inform. The- ory(submitted), March 2009. [5] R.J.Baxter. Exactly solvable models in stastical mechanics. 1982. [6] R.Landauer. Irreversibility and heat generation in the computing process. IBM J. Res. Dev., 5:183–191, 1961. [7] R.S.ELLIS. Entropy, large deviations, and statistical mechanics. 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved