Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Addition and Subtraction - Computer Arithmetic - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of Computer Arithmetic which includes Speedup Methods, Addition and Counting, Carry-Lookahead Adders, Variations in Fast Adder, Multioperand Addition, Ripple-Carry Adders, Analysis of Carry Propagation etc. Key important points are: Addition and Subtraction, Speedup Methods, Addition and Counting, Carry-Lookahead Adders, Variations in Fast Adder, Multioperand Addition, Ripple-Carry Adders, Analysis of Carry Propagation

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhimant
dhimant 🇮🇳

4.3

(8)

132 documents

1 / 103

Toggle sidebar

Related documents


Partial preview of the text

Download Addition and Subtraction - Computer Arithmetic - Lecture Slides and more Slides Computer Science in PDF only on Docsity! Part II Addition / Subtraction Number Representation Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Addition / Subtraction Basic Addition and Counting Carry-Lookahead Adders Variations in Fast Adders Multioperand Addition Multiplication Basic Multiplication Schemes High-Radix Multipliers Tree and Array Multipliers Variations in Multipliers Division Basic Division Schemes High-Radix Dividers Variations in Dividers Division by Convergence Real Arithmetic Floating-Point Reperesentations Floating-Point Operations Errors and Error Control Precise and Certifiable Arithmetic Function Evaluation Square-Rooting Methods The CORDIC Algorithms Variations in Function Evaluation Arithmetic by Table Lookup Implementation Topics High-Throughput Arithmetic Low-Power Arithmetic Fault-Tolerant Arithmetic Past, Present, and Future Parts Chapters I. II. III. IV. V. VI. VII. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 25. 26. 27. 28. 21. 22. 23. 24. 17. 18. 19. 20. 13. 14. 15. 16. E le m e n ta ry O p e ra ti o n s . Reconfigurable Arithmetic Appendix: Past, Present, and Future Docsity.com II Addition / Subtraction Topics in This Part Chapter 5 Basic Addition and Counting Chapter 6 Carry-Lookahead Adders Chapter 7 Variations in Fast Adder Chapter 8 Multioperand Addition Review addition schemes and various speedup methods • Addition is a key op (in itself, and as a building block) • Subtraction = negation + addition • Carry propagation speedup: lookahead, skip, select, … • Two-operand versus multioperand addition Docsity.com Basic Addition and Counting: Topics Topics in This Chapter 5.1 Bit-Serial and Ripple-Carry Adders 5.2 Conditions and Exceptions 5.3 Analysis of Carry Propagation 5.4 Carry Completion Detection 5.5 Addition of a Constant 5.6 Manchester Carry Chains and Adders Docsity.com 5.1 Bit-Serial and Ripple-Carry Adders Half-adder (HA): Truth table and block diagram Full-adder (FA): Truth table and block diagram x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Inputs Outputs c out c in out in x y s FA x y c s ---------------- 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 Inputs Outputs HA x y c s Docsity.com Half-Adder Implementations Fig. 5.1 Three implementations of a half-adder. c s (b) NOR-gate half-adder. x y x y (c) NAND-gate half-adder with complemented carry. x y c s s c x y x y (a) AND/XOR half-adder. _ _ _ Docsity.com Some Full-Adder Details CMOS transmission gate and its use in a 2-to-1 mux. z x x 0 1 (a) CMOS transmission gate: circuit and symbol (b) Two-input mux built of two transmission gates TG TG TG y P N Logic equations for a full-adder: s = x  y  cin (odd parity function) = x y cin  x  y  cin  x  y cin  x y  cin cout = x y  x cin  y cin (majority function) Docsity.com Simple Adders Built of Full-Adders Fig. 5.3 Using full-adders in building bit-serial and ripple-carry adders. x y c x s y c x s y c out c in 0 0 0 c 0 31 31 31 31 FA s c c 1 1 1 1 2 FA FA 32 . . . s 32 x s y c c i i i i i+1 FA Carry FF Shift Shift x y s (a) Bit-serial adder. (b) Ripple-carry adder. Clock Docsity.com VLSI Layout of a Ripple-Carry Adder Fig. 5.4 The layout of a 4-bit ripple-carry adder in CMOS implementation [Puck94]. xy 11 x0y0 c1c2cout cinc3 x2y2x3y3 Clock s 1 s 0s 2s 3 150 760  7 inverters Two 4-to-1 Mux's VDD V SS Docsity.com 5.2 Conditions and Exceptions Fig. 5.7 Two’s-complement adder with provisions for detecting conditions and exceptions. FAFA xy 11 x0y0 c0c1 s0s1 FA c2 sk–1 cout cin ... ck–1 ck–2 sk–2 ck xk–2yk–2xk–1yk–1 FA Overflow Negative Zero overflow2’s-compl = xk–1 yk–1 sk–1  xk–1 yk–1 sk–1 overflow2’s-compl = ck  ck–1 = ck ck–1  ck ck–1 Docsity.com Saturating Adders Saturating (saturation) arithmetic: When a result’s magnitude is too large, do not wrap around; rather, provide the most positive or the most negative value that is representable in the number format Designing saturating adders Saturating arithmetic in desirable in many DSP applications Saturation value Overflow 0 1 Adder Unsigned (quite easy) Signed (only slightly harder) Example – In 8-bit 2’s-complement format, we have: 120 + 26  18 (wraparound); 120 +sat 26  127 (saturating) Docsity.com 5.3 Analysis of Carry Propagation Bit positions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ----------- ----------- ----------- ----------- 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0 cout 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 1 cin \__________/\__________________/ \________/\____/ 4 6 3 2 Carry chains and their lengths Fig. 5.8 Example addition and its carry propagation chains. Docsity.com 5.5 Addition of a Constant: Counters Count register Mux Incrementer (Decrementer) +1 (1) Data in Load Count / Initialize _____ x + 1 x 0 1 Data out Reset Clear Enable Clock Counter overflow (x  1) c out Fig. 5.10 An up (down) counter built of a register, an incrementer (decrementer), and a multiplexer. Docsity.com Implementing a Simple Up Counter Fig. 5.11 Four-bit asynchronous up counter built only of negative-edge-triggered T flip-flops. T Q Q T Q Q T Q Q T Q Q Increment 0 0 1 1 2 2 3 3 Count Output (Fm arch text) Ripple-carry incrementer for use in an up counter. 1 0 k2 k1 . . . c k1 c k c k2 c 1 x x x x c 2 1 0 k2 k1 s s s s 2 s Docsity.com Faster and Constant-Time Counters Any fast adder design can be specialized and optimized to yield a fast counter (carry-lookahead, carry-skip, etc.) Fig. 5.12 Fast (constant-time) three-stage up counter. Load Load Increment Control 1 Control 2 Incrementer 1 Incrementer 1 Count register divided into three stages One can use redundant representation to build a constant-time counter, but a conversion penalty must be paid during read-out Docsity.com Details of a 5-Bit Manchester Carry Network Carry chain of a 5-bit Manchester adder. Dynamic logic, with 2-phase operation Clock low: Precharge (ci = 0) Clock high: Pull-down (if gi = 1) The transistors must be sized appropriately for maximum speed p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. p g a Logic 1 Logic 0 c c i+1 i i i i 0 1 0 1 0 1 (a) Conceptual representation c'i+1 ic' Clock ip VDD VSS ig (b) Possible CMOS realization. c0 c5 c0 c1 c2 c3 c4 Smaller transistors Larger transistors i = 4 i = 3 i = 2 i = 1 i = 0 Docsity.com Carry Network is the Essence of a Fast Adder Fig. 5.14 Generic structure of a binary adder, highlighting its carry network. Carry network . . . . . . x i y i g p s i i i c i c i+1 c k1 c k c k2 c 1 c 0 g p 1 1 g p 0 0 g p k2 k2 g p i+1 i+1 g p k1 k1 c 0 . . . . . . 0 0 0 1 1 0 1 1 annihilated or killed propagated generated (impossible) Carry is: g i p i gi = xi yi pi = xi  yi Ripple; Skip; Lookahead; Parallel-prefix Docsity.com Ripple-Carry Adder Revisited Fig. 5.15 Alternate view of a ripple-carry network in connection with the generic adder structure shown in Fig. 5.14. . . . c k1 c k c k2 c 1 g p 1 1 g p 0 0 g p k2 k2 g p k1 k1 c 0 c 2 The carry recurrence: ci+1 = gi  pi ci Latency of k-bit adder is roughly 2k gate delays: 1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus 1 XOR gate delay for generation of the sum bits Docsity.com Carry-Lookahead Adders: Topics Topics in This Chapter 6.1 Unrolling the Carry Recurrence 6.2 Carry-Lookahead Adder Design 6.3 Ling Adder and Related Designs 6.4 Carry Determination as Prefix Computation 6.5 Alternative Parallel Prefix Networks 6.6 VLSI Implementation Aspects Docsity.com 6.1 Unrolling the Carry Recurrence Recall the generate, propagate, annihilate (absorb), and transfer signals: Signal Radix r Binary gi is 1 iff xi + yi  r xi yi pi is 1 iff xi + yi = r – 1 xi  yi ai is 1 iff xi + yi < r – 1 xiyi  = (xi  yi)  ti is 1 iff xi + yi  r – 1 xi  yi si (xi + yi + ci) mod r xi  yi  ci The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation ci = gi–1  ci–1 pi–1 = gi–1  (gi–2  ci–2 pi–2) pi–1 = gi–1  gi–2 pi–1  ci–2 pi–2 pi–1 = gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  ci–3 pi–3 pi–2 pi–1 = gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  gi–4 pi–3 pi–2 pi–1  ci–4 pi–4 pi–3 pi–2 pi–1 = . . . Note: Addition symbol vs logical OR Docsity.com Full Carry Lookahead Theoretically, it is possible to derive each sum digit directly from the inputs that affect it Carry-lookahead adder design is simply a way of reducing the complexity of this ideal, but impractical, arrangement by hardware sharing among the various lookahead circuits s0 s1 s2 s3 y0 y1 y2 y3 x0 x1 x2 x3 cin . . . Docsity.com Two Solutions to the Fan-in Problem High-radix addition (i.e., radix 2h) Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radix?) Multilevel lookahead Example: 16-bit addition Radix-16 (four digits) Two-level carry lookahead (four 4-bit blocks) Either way, the carries c4, c8, and c12 are determined first c 1 6 c 1 5 c 1 4 c 1 3 c 1 2 c 1 1 c 1 0 c 9 c 8 c 7 c 6 c 5 c 4 c 3 c 2 c 1 c 0 c o u t ? ? ? c i n Docsity.com 6.2 Carry-Lookahead Adder Design Block generate and propagate signals g [i,i+3] = gi+3  gi+2 pi+3  gi+1 pi+2 pi+3  gi pi+1 pi+2 pi+3 p [i,i+3] = pi pi+1 pi+2 pi+3 ic 4-bit lookahead carry generator g p g p g p g p [i,i+3] p i+1 c i+2 c i+3 c g iii+1i+1i+2 i+2 i+3 i+3 [i,i+3] Fig. 6.2b Schematic diagram of a 4-bit lookahead carry generator. Docsity.com A Building Block for Carry-Lookahead Addition Fig. 6.2a A 4-bit lookahead carry generator g0 g1 g2 g3 c0 c4 c1 c2 c3 p3 p2 p1 p0 gi gi+1 g i+2 gi+3 ci ci+1 ci+2 ci+3 pi+3 pi+2 pi+1 pi g p [i,i+3] Block Signal Generation Intermediate Carries [i,i+3] Fig. 6.1 A 4-bit carry network Docsity.com Latency of a Multilevel Carry-Lookahead Adder Latency through the 16-bit CLA adder consists of finding: g and p for individual bit positions 1 gate level g and p signals for 4-bit blocks 2 gate levels Block carry-in signals c4, c8, and c12 2 gate levels Internal carries within 4-bit blocks 2 gate levels Sum bits 2 gate levels Total latency for the 16-bit adder 9 gate levels (compare to 32 gate levels for a 16-bit ripple-carry adder) Each additional lookahead level adds 4 gate levels of latency Latency for k-bit CLA adder: Tlookahead-add = 4 log4k + 1 gate levels Docsity.com 6.3 Ling Adder and Related Designs Consider the carry recurrence and its unrolling by 4 steps: ci = gi–1  ci–1 ti–1 = gi–1  gi–2 ti–1  gi–3 ti–2 ti–1  gi–4 ti–3 ti–2 ti–1  ci–4 ti–4 ti–3 ti–2 ti–1 Ling’s modification: Propagate hi = ci  ci–1 instead of ci hi = gi–1  hi–1 ti–2 = gi–1  gi–2  gi–3 ti–2  gi–4 ti–3 ti–2  hi–4 ti–4 ti–3 ti–2 CLA: 5 gates max 5 inputs 19 gate inputs Ling: 4 gates max 5 inputs 14 gate inputs The advantage of hi over ci is even greater with wired-OR: CLA: 4 gates max 5 inputs 14 gate inputs Ling: 3 gates max 4 inputs 9 gate inputs Once hi is known, however, the sum is obtained by a slightly more complex expression compared with si = pi  ci si = pi  hi ti–1 Propagate harry, not carry! Docsity.com 6.4 Carry Determination as Prefix Computation Fig. 6.5 Combining of g and p signals of two (contiguous or overlapping) blocks B' and B" of arbitrary widths into the g and p signals for block B. g p g p g p g" p" i 0 i 1 j 0 j 1 g p g' p' Block B' Block B" Block B (g, p) (g", p") (g', p') ¢ g = g" + g'p" p = p'p" Docsity.com 6.5 Alternative Parallel Prefix Networks Delay recurrence D(k) = D(k/2) + 1 = log2k Cost recurrence C(k) = 2C(k/2) + k/2 = (k/2) log2k Fig. 6.7 Ladner-Fischer parallel prefix sums network built of two k/2-input networks and k/2 adders. . . . Prefix Sums k/2 Prefix Sums k/2 . . . xk–1 xk/2 xk/2–1 x0 s k–1 s k/2 s k/2–1 s 0+ + . . . . . . . . . . . . . . . . . .. . . Docsity.com The Brent-Kung Recursive Construction Delay recurrence D(k) = D(k/2) + 2 = 2 log2k – 1 (–2 really) Cost recurrence C(k) = C(k/2) + k – 1 = 2k – 2 – log2k Fig. 6.8 Parallel prefix sums network built of one k/2-input network and k – 1 adders. Prefix Sums k/2 xk–1 xk–2 x3 x2 x1 x0 s k–1 s k–2 s 3 s 2 s 1 s 0 ++ + + + . . . . . . . . . . . . Docsity.com Brent-Kung Carry Network (8-Bit Adder) ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ [7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ] [0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ] g p [0,1] [0,1] g p [1,1] [1,1] g p [0,0] [0,0] [2, 3 ] [4, 5 ] [6, 7 ] [4, 7 ] [0, 3 ] [0, 1 ] Docsity.com Speed-Cost Tradeoffs in Carry Networks Method Delay Cost Ladner-Fischer log2k (k/2) log2k Kogge-Stone log2k k log2k – k + 1 Brent-Kung 2 log2k – 2 2k – 2 – log2k . . . Prefix Sums k/2 Prefix Sums k/2 . . . xk–1 xk/2 xk/2–1 x0 s k–1 s k/2 s k/2–1 s 0+ + . . . . . . . . . . . . . . . . . .. . .Improving the Ladner/Fischer design These outputs can be produced one time unit later without increasing the overall latency This strategy saves enough to make the overall cost linear (best possible) Docsity.com Hybrid B-K/K-S Carry Network (16-Bit Adder) x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s12s13s14s15 1 2 3 4 5 6 Level x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s 15 Brent- Kung Brent- Kung Kogge- Stone Fig. 6.11 A Hybrid Brent-Kung/ Kogge-Stone parallel prefix graph for 16 inputs. Brent-Kung: 6 levels 26 cells Kogge-Stone: 4 levels 49 cells Hybrid: 5 levels 32 cells Docsity.com 6.6 VLSI Implementation Aspects Example: Radix-256 addition of 56-bit numbers as implemented in the AMD Am29050 CMOS micro Our description is based on the 64-bit version of the adder In radix-256, 64-bit addition, only these carries are needed: c56 c48 c40 c32 c24 c16 c8 First, 4-bit Manchester carry chains (MCCs) of Fig. 6.12a are used to derive g and p signals for 4-bit blocks Next, the g and p signals for 4-bit blocks are combined to form the desired carries, using the MCCs in Fig. 6.12b Docsity.com 7 Variations in Fast Adders Chapter Goals Study alternatives to the carry-lookahead method for designing fast adders Chapter Highlights Many methods besides CLA are available (both competing and complementary) Best design is technology-dependent (often hybrid rather than pure) Knowledge of timing allows optimizations Docsity.com Variations in Fast Adders: Topics Topics in This Chapter 7.1 Simple Carry-Skip Adders 7.2 Multilevel Carry-Skip Adders 7.3 Carry-Select Adders 7.4 Conditional-Sum Adder 7.5 Hybrid Designs and Optimizations 7.6 Modular Two-Operand Adders Docsity.com 7.1 Simple Carry-Skip Adders Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks. (a) Ripple-carry adder (b) Simple carry-skip adder Ripple-carry stages 4-bit block 4-bit block 4-bit block c0 c4 c12 c16 c8 3 2 1 0 c0 3 2 1 0 c4 0 1 p[0,3] 4-bit block 0 1 p[4,7] c8 4-bit block 0 1 p[8,11] c12 4-bit block 0 1 p[12,15] c16 Docsity.com Carry-Skip Adder with Fixed Block Size Block width b; k/b blocks to form a k-bit adder (assume b divides k) Example: k = 32, b opt = 4, T opt = 13 stages (contrast with 32 stages for a ripple-carry adder) Tfixed-skip-add = (b – 1) + (k/b – 1) + (b – 1) in block 0 skips in last block  2b + k/b – 3 stages dT/db = 2 – k/b2 = 0  b opt = k/2 T opt = 22k – 3 . . . Docsity.com Carry-Skip Adder with Variable-Width Blocks Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths. b b b b. . . Ripple Skip Carry path (1) 01t–1 t–2 Block widths Carry path (3) Carry path (2) The total number of bits in the t blocks is k: 2[b + (b + 1) + . . . + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k b = k/t – t/4 + 1/2 Tvar-skip-add = 2(b – 1) + t – 1 = 2k/t + t/2 – 2 dT/db = –2k/t 2 + 1/2 = 0  t opt = 2k T opt = 2k – 2 (a factor of 2 smaller than for fixed-block) Docsity.com 7.2 Multilevel Carry-Skip Adders Fig. 7.3 Schematic diagram of a one-level carry-skip adder. S 1 c out c in S 1 S 1 S 1 S 1 Fig. 7.4 Example of a two-level carry-skip adder. S 2 S 1 c out c in S 1 S 1 S 1 S 1 c out c in S 2 S 1 S 1 S 1 Fig. 7.5 Two-level carry-skip adder optimized by removing the short-block skip circuits. Docsity.com Elaboration on Two-Level Carry-Skip Adder c c bb 0123  inout S1 S1 S1 S1 S1 12 – 1 – 2 S1 b0 S1 b –1 b –2 Given the delay pair {b, } for a level-2 block in Fig. 7.7a, the number of level-1 blocks that can be accommodated is g = min(b – 1, ) Example 7.2 c c bb 234b inout S1 S1 S1 S1 S1 12 – 1b – 2b b –3bb –2b S1 b0 S1 1 Single-level carry-skip adder with Tassimilate =  Single-level carry-skip adder with Tproduce = b Width of the ith level-1 block in the level-2 block characterized by {b, } is bi = min(b – g + i + 1,  – i); the total block width is then i=0 to g–1 bi Docsity.com Carry-Skip Adder Optimization Scheme Fig. 7.8 Generalized delay model for carry-skip adders. Inputs Level-h skip Block of b full-adder units I(b) A(b) G(b) E (b) h S (b) h Docsity.com 7.3 Carry-Select Adders Cselect-add(k) = 3Cadd(k/2) + k/2 + 1 Tselect-add(k) = Tadd(k/2) + 1 k /2-bit adder k /2-bit adder k - 1 k/2 k - 1 0 0 1 k/2+1 k/2+1 k/2 1 0 Mux k/2 c out c k/2 c in High k /2 bits Low k /2 bits k /2-bit adder Fig. 7.9 Carry-select adder for k-bit numbers built from three k/2-bit adders. Docsity.com Conditional-Sum Addition Example Table 7.2 Conditional-sum addition of two 16-bit numbers. The width of the block for which the sum and carry bits are known doubles with each additional level, leading to an addition time that grows as the logarithm of the word width k. x 0 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 y 0 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 s 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 c 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 s 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 c 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 2 0 s 0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 1 c 0 0 0 1 1 0 1 0 1 s 1 0 1 1 0 0 1 0 0 1 0 0 1 0 c 0 0 1 1 1 1 1 4 0 s 0 1 1 0 0 0 0 1 0 0 1 1 0 1 1 1 c 0 1 1 1 1 s 0 1 1 1 0 0 1 0 0 1 0 0 c 0 1 1 8 0 s 0 1 1 1 0 0 0 1 0 1 0 0 0 1 1 1 c 0 1 1 s 0 1 1 1 0 0 1 0 c 0 16 0 s 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 1 c 0 1 s c Block width Block carry-in Block sum and block carry-out 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 c in c out Docsity.com Elaboration on Conditional-Sum Addition Two adjacent 4-bit blocks, forming an 8-bit block 1 1 1 1 8j + 3 . . . 8j 0 0 0 0 0 0 1 1 0 0 1 1 8j + 7 . . . 8j + 4 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 Left 4-bit block Right 4-bit block Two versions of sum bits and carry-out in 4-bit blocks 1 1 1 1 8j + 3 . . . 8j 8j + 7 . . . 0 0 0 0 0 1 Two versions of sum bits and carry-out in 8-bit block Docsity.com 7.5 Hybrid Designs and Optimizations Fig. 7.12 A hybrid carry-lookahead/carry-select adder. Lookahead Carry Generator Carry-Select c g, p in MuxMuxMux cout 0 1 0 1 0 1 Block The most popular hybrid addition scheme: Docsity.com Optimizations in Fast Adders What looks best at the block diagram or gate level may not be best when a circuit-level design is generated (effects of wire length, signal loading, . . . ) Modern practice: Optimization at the transistor level Variable-block carry-lookahead adder Optimizations for average or peak power consumption Timing-based optimizations (next slide) Docsity.com Optimizations Based on Signal Timing So far, we have assumed that all input bits are presented at the same time and all output bits are also needed simultaneously Fig. 7.14 Example arrival times for operand bits in the final fast adder of a tree multiplier [Oklo96]. 15 10 5 0 Bit Position Latency from inputs in XOR-gate delays 0 20 40 60 Docsity.com Modern Low-Power Adders Implemented in CMOS Zeydel, Kluter, Oklobdzija, ARITH-17, 2005 Cond’l-Sum Ling Three-Stage Ling 64-Bit Adder Designs Docsity.com General Modular Adders (x + y) mod m if x + y  m then x + y – m else x + y Carry-Save Adder –m x y Mux Sign bit (x + y) mod m x + y – m x + y Adder Adder Fig. 7.15 Fast modular addition. Docsity.com 8 Multioperand Addition Chapter Goals Learn methods for speeding up the addition of several numbers (needed for multiplication or inner-product) Chapter Highlights Running total kept in redundant form Current total + Next number  New total Deferred carry assimilation Wallace/Dadda trees, parallel counters Modular multioperand addition Docsity.com Multioperand Addition: Topics Topics in This Chapter 8.1 Using Two-Operand Adders 8.2 Carry-Save Adders 8.3 Wallace and Dadda Trees 8.4 Parallel Counters and Compressors 8.5 Adding Multiple Signed Numbers 8.6 Modular Multioperand Adders Docsity.com Pipelined Implementation for Higher Throughput Fig. 8.3 Serial multioperand addition when each adder is a 4-stage pipeline. (i–10)(i–9) Delay Delays Ready to compute s (i–12) x(i–1) x(i) x +(i) x(i–1) x +(i–8) x + (i–11)x + x (i–7)x +(i–6) x (i–5)x +(i–4) x Problem to think about: Ignoring start-up and other overheads, this scheme achieves a speedup of 4 with 3 adders. How is this possible? Docsity.com Parallel Implementation as Tree of Adders Fig. 8.4 Adding 7 numbers in a binary tree of adders. Adder Adder Adder AdderAdder Adder k k+1 k+2 k+3 k+2 k+1k+1 k kk kk k Ttree-fast-multi-add = O(log k + log(k + 1) + . . . + log(k + log2n – 1)) = O(log n log k + log n log log n) Ttree-ripple-multi-add = O(k + log n) [Justified on the next slide] log2n adder levels n – 1 adders Docsity.com Elaboration on Tree of Ripple-Carry Adders Ttree-ripple-multi-add = O(k + log n) Adder Adder Adder AdderAdder Adder k k+1 k+2 k+3 k+2 k+1k+1 k kk kk k Fig. 8.5 Ripple-carry adders at levels i and i + 1 in the tree of adders used for multi-operand addition. . . . . . . Level i Level i+1 HAFA HAFA t t+1 tt+1t+1 t+1 t+1 t+2 t+2 t+2 t+2 t+3 t+2t+3 The absolute best latency that we can hope for is O(log k + log n) There are kn data bits to process and using any set of computation elements with constant fan-in, this requires O(log(kn)) time We will see shortly that carry-save adders achieve this optimum time Docsity.com Example Reduction by a CSA Tree 12 FAs 6 FAs 6 FAs 4 FAs + 1 HA 7-bit adder Total cost = 7-bit adder + 28 FAs + 1 HA Fig. 8.10 Addition of seven 6-bit numbers in dot notation. 8 7 6 5 4 3 2 1 0 Bit position 7 7 7 7 7 7 62 = 12 FAs 2 5 5 5 5 5 3 6 FAs 3 4 4 4 4 4 1 6 FAs 1 2 3 3 3 3 2 1 4 FAs + 1 HA 2 2 2 2 2 1 2 1 7-bit adder --Carry-propagate adder-- 1 1 1 1 1 1 1 1 1 Fig. 8.11 Representing a seven- operand addition in tabular form. A full-adder compacts 3 dots into 2 (compression ratio of 1.5) A half-adder rearranges 2 dots (no compression, but still useful) Docsity.com Width of Adders in a CSA Tree Fig. 8.12 Adding seven k- bit numbers and the CSA/CPA widths required. Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels k-bit CPA k-bit CSA k-bit CSA k-bit CSA k-bit CSA 0k+2 The index pair [i, j] means that bit positions from i up to j are involved. k-bit CSA [0, k–1] [0, k–1] [0, k–1] [0, k–1] [0, k–1] [0, k–1] [0, k–1] [0, k–1] [0, k–1] [1, k] [1, k] [1, k] [1, k] [0, k–1] [2, k+1] [2, k+1] [2, k+1] [2, k+1] [1, k–1] 1 [1, k+1] k+1 k k–1 1 3 2 4 Docsity.com 8.3 Wallace and Dadda Trees h(n) = 1 + h(2n/3) n(h) = 3n(h – 1)/2 2  1.5h–1< n(h)  2  1.5h . . . inputsn 2 outputs levelshh levels Table 8.1 The maximum number n(h) of inputs for an h-level CSA tree –––––––––––––––––––––––––––––––––––– h n(h) h n(h) h n(h) –––––––––––––––––––––––––––––––––––– 0 2 7 28 14 474 1 3 8 42 15 711 2 4 9 63 16 1066 3 6 10 94 17 1599 4 9 11 141 18 2398 5 13 12 211 19 3597 6 19 13 316 20 5395 –––––––––––––––––––––––––––––––––––– n(h): Maximum number of inputs for h levels Docsity.com 8.4 Parallel Counters and Compressors Fig. 8.16 A 10-input parallel counter also known as a (10; 4)-counter. 0 1 0 1 0 1 0 2 1 1 0 1 0 2 13 2 3-bit ripple-carry adder FA FA HA HA FA FAFAFA1-bit full-adder = (3; 2)-counter Circuit reducing 7 bits to their 3-bit sum = (7; 3)-counter Circuit reducing n bits to their log2(n + 1)-bit sum = (n; log2(n + 1))-counter Docsity.com Accumulative Parallel Counters Possible application: Compare Hamming weight of a vector to a constant True generalization of sequential counters FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA q-bit initial count x n increment signals vi, 2 q–1 < n  2q q-bit tally of up to 2q – 1 of the increment signals Ignore, or use for decision q-bit final count y cq n increment signals vi q-bit final count y = x + Svi Parallel incrementer q-bit initial count x Count register Docsity.com Up/Down Parallel Counters Generalization of up/down counters Possible application: Compare Hamming weights of two input vectors Docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved