Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Deterministic Extractors: Converting Weak Random Sources into Independent, Unbiased Bits, Study notes of Number Theory

Deterministic extractors, algorithms that convert weak random sources into independent and unbiased bits. The motivation for extractors, definitions of deterministic extractors and statistical difference, and properties of extractors. It also introduces the concept of seeded extractors and their importance in simulating randomized algorithms with weak sources.

Typology: Study notes

2010/2011

Uploaded on 10/26/2011

thecoral
thecoral 🇺🇸

4.4

(28)

133 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Deterministic Extractors: Converting Weak Random Sources into Independent, Unbiased Bits and more Study notes Number Theory in PDF only on Docsity! CS225: Pseudorandomness Prof. Salil Vadhan Lecture 10: Randomness Extractors March 13, 2007 Based on scribe notes by Vitaly Feldman, Andrei Jorza, and Pavlo Pylyavskyy. Having spent several lectures on expander graphs, the first major pseudorandom object in this course, we now move on to the second: randomness extractors. We begin by discussing the original motivation for extractors, which was to simulate randomized algorithms with sources of biased and correlated bits. This motivation is still compelling, but extractors have taken on a much wider significance in the years since they were introduced. They have found numerous applications in theoretical computer science beyond this initial motivating one, in areas random from cryptogra- phy to distributed algorithms to metric embeddings. More importantly from the perspective of this course, they have played a major unifying role in the theory of pseudorandomness. Indeed, the the links between the various pseudorandom objects we will study in this course (expander graphs, randomness extractors, list-decodable codes, pseudorandom generators, samplers) were all discovered through extractors and are still best understood through extractors. 1 Weak Random Sources and Deterministic Extractors Typically, when we design randomized algorithms or protocols, we assume that all algorithms/parties have access to sources of perfect randomness, i.e. bits that are unbiased and completely indepen- dent. However, when we implement these algorithms, the physical sources of randomness to which we have access may contain biases and correlations. For example, we may use low-order bits of the system clock, the user’s mouse movements, or a noisy diode based on quantum effects. While these sources may have some randomness in them, the assumption that the source is perfect is a strong one, and thus it is of interest to try and relax it. Ideally, what we would like is a compiler that takes any algorithm A that works correctly when fed perfectly random bits Um, and produces a new algorithm A ′ that will work even if it is fed random bits X ∈ {0, 1}n that come from a ‘weak’ random source. For example, if A is a BPP algorithm, then we would like A′ to also run in probabilistic polynomial time. One way to design such compilers is to design a randomness extractor Ext : {0, 1}n → {0, 1}m such that A(X) ≡ Um. Von Neumann Sources. A simple version of this question was already considered by von Neu- mann. He looked at sources that consist of identical random boolean variables X1, X2, . . . , Xn ∈ {0, 1} which are independent but biased. That is, for every i, Pr [Xi = 1] = δ for some unknown δ. How can such a source be converted into a source of independent, unbiased bits? Sources of Independent Bits. Lets now look at a bit more interesting source in which all the variables are still independent but the bias is no longer the same. Specifically, for every i, Pr [Xi = 1] = δi and 0 < δ ≤ δi ≤ 1 − δ. How can we deal with such a source? 1 Let’s be more precise about the problems we are studying. A source on {0, 1}n is simply a random variable X taking values in {0, 1}n. In each of the above examples, there is an implicit class of sources being studied. For example, IndBitsn,δ is the class of sources X on {0, 1} n where the bits Xi are independent and satisfy δ ≤ Pr[Xi = 1] ≤ 1− δ. We could define VNn,δ to be the same with the further restriction that all of the Xi’s are identically distributed, i.e. Pr[Xi = 1] = Pr[Xj = 1] for all i, j. Definition 1 (deterministic extractors) 1 Let C be a class of sources on {0, 1}n. An ε-extractor for C is a function Ext : {0, 1}n → {0, 1}m such that for every X ∈ C, Ext(X) is ‘ε-close’ to Um. Note that we want a single function Ext that works for all sources in the class. This captures the idea that we do not want to assume we know the exact distribution of the physical source we are using, but only that it comes from some class. For example, for IndBitsn,δ, we know that the bits are independent and none are too biased, but not the specific bias of each bit. Note also that we only allow the extractor one sample from the source X. If we want to allow multiple independent samples, then this should be modelled explicitly in our class of sources; ideally we would like to minimize the independence assumptions used. We still need to define what we mean for the output to be ε-close to Um. Definition 2 For random variables X and Y taking values in U , their statistical difference (also known as variation distance) is ∆(X,Y ) = maxT⊆U |Pr[X ∈ T ] − Pr[Y ∈ T ]|. We say that X and Y are -close if ∆(X,Y ) ≤ . The intuitive understanding would be that any event in X happens in Y with same probability ±. This is really the most natural measure of distance for probability distributions (much moreso than the `2 distance we used in the study of random walks). In particular, it satisfies the following natural properties. 1. 0 ≤ ∆(X,Y ) ≤ 1, equality holds for identical distributions and for distributions with disjoint support correspondingly. 2. ∆(X,Y ) is symmetric. 3. ∆(X,Z) ≤ ∆(X,Y ) + ∆(X,Z). 4. for any function f we have ∆(f(X), f(Y )) ≤ ∆(X,Y ). 5. ∆((X1, X2), (Y1, Y2)) ≤ ∆(X1, Y1) + ∆(X2, Y2) if X1 and X2, as well as Y1 and Y2, are inde- pendent. 6. ∆(X,Y ) = 12 · |X − Y |1, where | · |1 is the `1 distance. Thus, X is ε-close to Y iff we can transform X into Y by ‘shifting’ at most an ε fraction of probability mass. We now observe that extractors according to this definition give us the ‘compilers’ we want. 1Such extractors are called deterministic or seedless to contrast with the probabilistic or seeded randomness extractors we will see later. 2 2.3 k-sources Definition 6 X is a k-source is H∞(X) ≥ k, i.e., if Pr [X = x] ≤ 2 −k. A typical setting of parameters is k = δn for some fixed δ, e.g., 0.01. We call δ the min-entropy rate. Some different ranges that are commonly studied (and are useful for different applications): k = polylog(n), k = nγ for a constant γ ∈ (0, 1), k = δn for a constant δ ∈ (0, 1), and k = n−O(1). The middle two (k = nγ and k = δn) are the most natural for simulating randomized algorithms with weak random sources. Examples of k-sources: • k random and independent bits, together with n − k fixed bits. These are called oblivious bit-fixing sources. • k random and independent bits, and n − k bits that depend arbitrarily on the first k bits. These are called adaptive bit-fixing sources. • Santha-Vazirani δ sources. Take k = log 1(1−δ)n = Θ(δn). • Uniform distribution on S ⊂ {0, 1}n with |S| = 2k. These are called flat k-sources. It turns out that flat k-sources are really representative of general k-sources. Proposition 7 Every k-source is a convex combination of flat k-sources (provided that 2k ∈ N), i.e., X = ∑ piXi with 0 ≤ pi ≤ 1, ∑ pi = 1 and all the Xi are flat k-sources. Proof Sketch: Consider each source on [N ] (recall that N = 2n) as a vector X ∈ RN . Then X is a k-source if and only if ∀i one has X(i) ∈ [0, 1] so that ∑ X(i) = 1 (condition for probabilities) and ∀i one has X(i) ≤ 2−k (condition for k-source). The set of all k-sources is a polytope determined by all these vectors, since all these conditions are linear. More precisely, the set of k-sources is the intersection of the hypercube [0, 2−k ]N and the hyperplane ∑ X(i) = 1. This is a convex polytope and so any k-source is a convex combination of the vertices of the polytope. The vertices of the polytope are the points that make a maximal subset of the inequalities tight. Since ∑ X(i) = 1, these sources are precisely those where X(i) = 2−k for 2k values of i and X(i) = 0 for the remaining values of i. Therefore the vertices are represented by the flat k-sources.  Thus, we can think of any k-source as being obtained by first selecting a flat k-source Xi according to some distribution (given by the pi’s) and then selecting a random sample from Xi. This means that if we can compile probabilistic algorithms to work with flat k-sources, then we can compile them to work with any k-source. 5 3 Seeded Extractors Proposition 5 tell us that it impossible to have deterministic extractors for Santha-Vazirani sources. Here we consider k-sources, which are more general than Santha-Vazirani sources, and hence also impossible to have deterministic extractors. The impossibility result for k-sources is stronger and simpler to prove. Proposition 8 For any Ext : {0, 1}n → {0, 1} there exists an (n − 1)-source X so that Ext(X) is constant. Proof: There exists b ∈ {0, 1} so that |Ext−1(b)| ≥ 2n/2 = 2n−1. Then let X be the uniform distribution on Ext−1(b). On the other hand, if we reverse the order of quantifiers, allowing the extractor to depend on the source, it is easy to see that good extractors exist and in fact a randomly chosen function will be a good extractor with high probability. Proposition 9 For every n, k,m ∈ N, every ε > 0, and every flat k-source X, if we choose a random function Ext : {0, 1}n → {0, 1}m with m = k − 2 log(1/ε) − O(1), then Ext(X) will be ε-close to Um with probability 1 − 2 −Ω(Kε2), where K = 2k. (We will commonly use the convention that capital variables are 2 raised to the power of the corresponding lowercase variable, such as K = 2k above.) Proof: Choose our extractor randomly. We want it to have following property: for all T ⊆ [M ], |Pr[Ext(X) ⊆ T ] − Pr[Um ⊆ T | ≤ . This can be reformulated as |{x∈X|Ext(x)∈T}| K differs from density µ(T ) on not more than . For each point x ∈ Supp(X) the chance to be in T is µ(T ). Chernoff bound says in this case that for each fixed T this condition holds with high probability, that is at least 1 − 2−Ω(K 2). Then the probability that condition is violated for at least one T is at most 2M2−Ω(K 2), which is less than 1 for m = k − 2 log( 1 ) − O(1). Note that the failure probability is doubly-exponentially small in k. Naively, one might hope that we could get an extractor that’s good for all flat k-sources by a union bound. But the number of flat k-sources is ( N K ) ≈ NK (where n = 2n), which is unfortunately a larger double-exponential in k. We can overcome this gap by allowing the extractor to be ‘slightly’ probabilistic, i.e. allowing the extractor a seed consisting of a small number of truly random bits in addition to the weak random source. We can think of this seed of truly random bits as random choice of an extractor from family of extractors. This leads to the following crucial definition: Definition 10 (seeded extractors) Extractor Ext : {0, 1}n × {0, 1}d → {0, 1}m is a (k, )- extractor if for any k-source X on {0, 1}n, Ext(X,Ud) is -close to Um. We want to give a construction which minimizes d and maximizes m. We prove the following theorem. 6 Theorem 11 For any n and k (k ≤ n) and any  > 0 there exists a (k, ) - extractor Ext : {0, 1}n ×{0, 1}d → {0, 1}m with m = k + d− 2 log( 1 )−O(1) and d = log(n− k) + 2 log( 1  ) + O(1). One setting of parameters to keep in mind (for our application of simulating randomized algorithms with a weak source) is k = δn, with δ a fixed constant (e.g. δ = 0.01), and  a fixed constant (e.g.  = 0.01). Proof: We use probabilistic method to prove the theorem. It suffices for Ext to work for flat k-sources. Choose extractor Ext at random. Then the probability that extractor fails is not more than number of flat k-sources times times the probability Ext fails for fixed flat k-source. By the above proposition, the probability of failure for a fixed flat k-source is at most 2−Ω(KD 2), since (X,Ud) is a flat (k + d)-source) and m = k + d− 2 log( 1  )−O(1). Thus the total failure probability is at most ( N K ) · 2−Ω(KD 2) ≤ ( Ne K )K 2−Ω(KD 2). The letter expression is less than 1 if D2 ≥ 2 log NeK = c(n − k) + c ′ for constants c, c′.This is equivalent to d = log(n − k) + 2 log( 1 ) + O(1). It turns out that both bounds (on m and d) are individually tight upto the O(1) terms. 4 Simulating Randomized Algorithms Now we study simulating randomized algorithm having weak random source. Usual randomized algorithm takes input string w and m random bits, and outputs the correct answer with probability at least 1 − γ. Assume now we do not have a source of perfectly random bits. Instead we have a k-source and an extractor, which takes an input from our weak source. We also allow it to take small seed of purely random bits, which as mentioned above, can be viewed as choosing a random extractor from some family. The output of the extractor we feed into our randomized algorithm A instead of purely random bits it took before. Since above we had seed having logarithmic size, we can actually eliminate it just by running through all possible values it can take and ruling my majority vote. Proposition 12 Let A(w; r) be a randomized algorithm such that A(w;Um) has error probability at most δ, and let Ext : {0, 1}n × {0, 1}d → {0, 1}m be a (k, ε)-extractor. Define A′(w;x) = A′(w, x) = maj y∈{0,1}d {A(w,Ext(x, y))}. Then for every k-source X on {0, 1}n, A′(w;X) has error probability at most 2(γ + ε). Proof: The probability that A(w,Ext(X,Ud)) is incorrect is not more than probability A(w,Um) is incorrect plus , γ +  in particular, according to the defining property of statistical difference. Then the probability that majy A(w,Ext(X, y)) is incorrect is at most 2(γ + ). 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved