Download Tracking Experts: Mixing Past Posteriors by Bousquet and Warmuth and more Study notes Computer Science in PDF only on Docsity! 1Tracking a Small Set of Experts by Mixing Past Posteriors Olivier Bousquet Ecole Polytechnique, France and BIOwulf Technologies, New-York Manfred K. Warmuth UC Santa Cruz 2Outline • Motivate on-line learning, relative loss bounds • Comparator on-line as well • Shifting back • Mixing Update • Experimental Results • Future work O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 5What kind of performance can we expect ? - L1..T,A be the total loss of algorithm A - L1..T,i be the total loss of i-th expert Ei • Form of bounds: for all sequence (x1, y1), . . . , (xT , yT ) L1..T,A ≤ min i (L1..T,i + c log n) where c is constant • Bounds the loss of the algorithm relative to the loss of best expert O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 6General Expert Algorithm • Master algorithm predicts with weighted average ŷt = vt · xt • The weights are updated according to the Loss Update vt+1,i := vt,i e −η Lt,i normaliz. where Lt,i is loss of expert i in trial t → Weighted Majority Algorithm [LW89] → Generalized by Vovk [Vovk90] O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 7Comparator Changes with Time Expert 7 20 4 51 Sequence of trials • Off-line algorithm partitions sequence into sections and chooses best expert in each section • Goal: Do well compared to the best off-line partition • Problem: Loss Update learns too well and does not recover fast enough O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 10Weights of Fixed Share Alg. • Tracks the best expert 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 W ei gh t Best Expert 1 2 3 4 5 6 7 O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 11Shifting Bounds • Recall Static Expert bound L1..T,A ≤ min i (L1..T,i + O(log n)) – Comparison class: set of experts • Bounds for Share Algorithms [HW98] L1..T,A ≤ min P (L1..T,P + O(# of bits for P )) – Comparison class: set of partitions – # of bits for partitions with k shifts: k log n + log ( T k ) O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 12Freund’s Problem • Number of possible experts n is large n ≈ 106 • Experts in partition chosen from small subset of size m m ≈ 10 • # of bits for partitions with k shifts: log ( n m ) + k log m + log ( T k ) • Naive algorithm runs Fixed Share to Start Vector alg. for every subset of m out of n experts O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 15Weights of Fixed Share to Start Vector Alg. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 1 2 3 1 2 W ei gh t Best Expert 1 2 3 O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 16Weights of Fixed Share to Decaying Past Alg. • Improved recovery when expert used before 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 1 2 3 1 2 W ei gh t Best Expert 1 2 3 O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 17Fixed Share to Decaying Past - Log Weights • Past good experts remain at higher level -14 -12 -10 -8 -6 -4 -2 0 1 2 1 2 3 1 2 Lo g W ei gh t Best Expert 1 2 3 Typical Max others O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 20Fixed Share to Start Vector - Log Weights • No memory -14 -12 -10 -8 -6 -4 -2 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Lo g W ei gh t Best Expert 1 2 3 4 5 6 7 8 9 10 Max others O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 21Bounds Again • Bounds still have the form L1..T,A ≤ min P (L1..T,P + O(# of bits for P )) → Boundaries are encoded twice → Off-line problem NP-complete O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001 22Alternates to Mixing • What we need for bounds vt+1 ≥ βt+1,qv m q , for 0 ≤ q ≤ t (*) Mixing Update vt+1 = t ∑ q=0 βt+1,qv m q Max Update vt+1 = 1 normaliz. max q=0,...,t βt+1,qv m q Projection Update vt+1 = arg min v∈(∗) ∆(v, vmt ) O. Bousquet & M.K. Warmuth: Tracking a Small Set of Experts by Mixing Past Posteriors COLT, July 19, 2001