Download Power Issues with Embedded Systems - Lecture Slides | CPSC 689 and more Study notes Computer Science in PDF only on Docsity! 1 Mahapatra-Texas A&M-Spring’02 1 Power Issues with Embedded Systems Rabi Mahapatra Computer Science Mahapatra-Texas A&M-Spring’02 2 Plan for today • Some Power Models • Familiar with technique to reduce power consumption • Reading assignment: paper by Bill Moyer on “Low-Power Design for Embedded Processors” Proceedings of IEEE Nov. 2001 2 Mahapatra-Texas A&M-Spring’02 3 Next Generation Computing: Watts metrics? Router (.) (.) | vvv| (.) (.) | vvv| (.) (.) | vvv| (.) (.) | vvv| Server/Data Processing Mega Watts Wireless Networks Micro-watts Base Station Laptops,PDAs, Cellphones, GPS .1 - 10W (Watts) Mahapatra-Texas A&M-Spring’02 4 Power Aware • Increase in prominence of portable devices • SoC complexity: heat generation • Traditionally, speed (performance), & area (cost), • Now, add power as the new axix 5 Mahapatra-Texas A&M-Spring’02 9 Empirical • Based on chip estimation system [Glaser ICCAD91]: P = αG(Er + CL*Vdd2)f G = number of equivalent gates Er = energy consumed by an equivalent gate CL = average loading per gate including fanout α = activity factor • Demerit: lacks consideration on different logic styles Mahapatra-Texas A&M-Spring’02 10 Information Theoretic • Reference [Najm95] • Based on activity estimation P = k (CL)(α ) = k(A)(h) A = area, h = entropy factor (a function of entropy H) • Limited accuracy, does not include possibility of encoding 6 Mahapatra-Texas A&M-Spring’02 11 Signal Model Based • Reference [Landman TCAD96] – Properties of 2’s complement encoded data stream – Arithmetic blocks are regular • Analytical Method: [Ramprasad TCAD97] – Word-level statistics – Auto-regressive Moving Average signal generation model – 2’s complement & sign magnitude signal encoding Mahapatra-Texas A&M-Spring’02 12 Software Power • Power consumed by a processor (P): Ref [TiwariTVLSI94] P = Vdd * I • Energy (E): E = P *Tp, program execution time • Program Execution Time(Tp) Tp = N*Tclk E = P *Tp = Vdd * I *N*Tclk If Vdd and Tclk are assumed to be constant, Energy is measured by measuring current I. • Low-power software: small value of N or fast execution time • When Vdd and Tclk are varying? Current measurements? 7 Mahapatra-Texas A&M-Spring’02 13 Instruction Level Power Modeling • Reference: [Tiwari TVLSI97] • Current consumption of a program with no loops but M instruction I = ∑i=0 Bk *Nk + O i,i+1modM / ∑i=0 Nk Bk = Base current of kth instruction in the program Nk = Number of clocks required to complete kth instruction O i,j = overhead of executing successive instruction Mahapatra-Texas A&M-Spring’02 14 Power Dissipation in CMOS • Three sources: Pswitching: Switching power (capacitive): dominant today Pleakage: Leakage Power, will dominant in 0.13 micron and below. Pshortcircuit: Schort circuit component CL 10 Mahapatra-Texas A&M-Spring’02 19 Static Power • Not a factor in pure CMOS designs • Sense amplifier, voltage references and constant current sources contribute to the static power • Regardless of device state change Total Power: Pswitching + Pshortcircuit+Pstatic+Pleakage Mahapatra-Texas A&M-Spring’02 20 Power – Delay Leverage • Power & Delay trade off • Speed is proportional to CL * Vdd / (Vdd – Vt)1.5 • Trends: Reduce Vdd & Vt to improve speed • Energy-delay product is minimized when Vdd = 2 * Vt • Reducing Vdd from 3 * Vt to 2 * Vt results in an approximately 50% decrease in performance while using only 44% of the power. 11 Mahapatra-Texas A&M-Spring’02 21 Algorithmic Technique PR • Focus on minimizing number of operation weighted by their cost: First order goal. – Underlying implementation: arithmatic or logical • Recomputation of intermediate results may be cheaper than memory use • Loop unrolling: reduces loop overhead • Number representation: – fixed point or floating point – Sign-magnitude versus 2’s complement is preferred in certain DSP when input samples are uncorrelated and dynamic range minimum – Bit length (of course trade off accuracy) – Adaptive bit truncation in portable video encoder reduces 70% of the power over full bit width Mahapatra-Texas A&M-Spring’02 22 Architectural Technique PR • Instruction set design and exploiting parallelism & pipelining are important • Architecture driven voltage scaling method [Chandrakasan, IEEE J. Solid state Circuits 92] – Lower voltage for power but apply parallelism/pipeline to speedup – Possible if application has parallelism, trade-off with latency due to pipeline & data dependencies, and area – Speculative logic allowed if low overhead else determental • Meeting required performance without overdesigning a solution is fundamental optimization – Extra logic power is not controllable and they still present even if parallelism is absent. 12 Mahapatra-Texas A&M-Spring’02 23 Logic and Circuit Level PR • Focus on reducing switched capacitance or/and signal swing • Signal probabilities may favor either static or dynamic CMOS logic – Example: Two-input NAND gate with uniform distribution at inputs, probability of output being 0 (p0) is 0.25, p1 = 0.75 – For static gate, probability of a power consuming transition from 0 > 1 is p0*p1 = 0.1875 – For dynamic gate with the output precharged to logic 1, power is consumed whenever the output was previously 0. Thus it has higher (by 0.25) transition at output than static. – However, dynamic circuit has lower input capacitance by a factor of 2 to 3. Mahapatra-Texas A&M-Spring’02 24 Logic circuit PR • For wider input static gate, say four input NAND, p0 = 0.0625 and p0 > 1 is 0.0586 • For dynamic version as above, p0 = p0 > 1 = 0.0625 • Static logic suffers from glitches: needs restructuring and that adds up power more than 20% Hazard X Y Restructured Logic X Y