Download Division Algorithms in Computer Arithmetic and more Study notes Physics in PDF only on Docsity! May 2007 Computer Arithmetic, Division Slide 1 Part IV Division Number Representation Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Addition / Subtraction Basic Addition and Counting Carry-Lookahead Adders Variations in Fast Adders Multioperand Addition Multiplication Basic Multiplication Schemes High-Radix Multipliers Tree and Array Multipliers Variations in Multipliers Division Basic Division Schemes High-Radix Dividers Variations in Dividers Division by Convergence Real Arithmetic Floating-Point Reperesentations Floating-Point Operations Errors and Error Control Precise and Certifiable Arithmetic Function Evaluation Square-Rooting Methods The CORDIC Algorithms Variations in Function Evaluation Arithmetic by Table Lookup Implementation Topics High-Throughput Arithmetic Low-Power Arithmetic Fault-Tolerant Arithmetic Past, Present, and Future Parts Chapters I. II. III. IV. V. VI. VII. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 25. 26. 27. 28. 21. 22. 23. 24. 17. 18. 19. 20. 13. 14. 15. 16. E le m en ta ry O pe ra tio ns May 2007 Computer Arithmetic, Division Slide 2 About This Presentation This presentation is intended to support the use of the textbook Computer Arithmetic: Algorithms and Hardware Designs (Oxford University Press, 2000, ISBN 0-19-512583-5). It is updated regularly by the author as part of his teaching of the graduate course ECE 252B, Computer Arithmetic, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Unauthorized uses are strictly prohibited. ยฉ Behrooz Parhami Edition Released Revised Revised Revised Revised First Jan. 2000 Sep. 2001 Sep. 2003 Oct. 2005 May 2007 May 2007 Computer Arithmetic, Division Slide 5 13 Basic Division Schemes Chapter Goals Study shift/subtract or bit-at-a-time dividers and set the stage for faster methods and variations to be covered in Chapters 14-16 Chapter Highlights Shift/subtract divide vs shift/add multiply Hardware, firmware, software algorithms Dividing 2โs-complement numbers The special case of a constant divisor May 2007 Computer Arithmetic, Division Slide 6 Basic Division Schemes: Topics Topics in This Chapter 13.1. Shift/Subtract Division Algorithms 13.2. Programmed Division 13.3. Restoring Hardware Dividers 13.4. Nonrestoring and Signed Division 13.5. Division by Constants 13.6. Preview of Fast Dividers May 2007 Computer Arithmetic, Division Slide 7 13.1 Shift/Subtract Division Algorithms Notation for our discussion of division algorithms: z Dividend z2kโ1z2kโ2 . . . z3z2z1z0 d Divisor dkโ1dkโ2 . . . d1d0 q Quotient qkโ1qkโ2 . . . q1q0 s Remainder, z โ (d ร q) skโ1skโ2 . . . s1s0 Initially, we assume unsigned operands Fig. 13.1 Division of an 8-bit number by a 4-bit number in dot notation. Dividend Subtracted bit-matrix z s Remainder Quotient q Divisor d q d 2 3 3 โ q d 2 2 2 โ q d 2 1 1 โ q d 2 0 0 โ May 2007 Computer Arithmetic, Division Slide 10 Examples of Basic Division Fig. 13.2 Examples of sequential division with integer and fractional operands. Integer division Fractional division ====================== ===================== z 0 1 1 1 0 1 0 1 zfrac . 0 1 1 1 0 1 0 1 24d 1 0 1 0 dfrac . 1 0 1 0 ====================== ===================== s(0) 0 1 1 1 0 1 0 1 s(0) . 0 1 1 1 0 1 0 1 2s(0) 0 1 1 1 0 1 0 1 2s(0) 0 . 1 1 1 0 1 0 1 โq3 24d 1 0 1 0 {q3 = 1} โqโ1d . 1 0 1 0 {qโ1=1}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ s(1) 0 1 0 0 1 0 1 s(1) . 0 1 0 0 1 0 1 2s(1) 0 1 0 0 1 0 1 2s(1) 0 . 1 0 0 1 0 1 โq2 24d 0 0 0 0 {q2 = 0} โqโ2d . 0 0 0 0 {qโ2=0}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ s(2) 1 0 0 1 0 1 s(2) . 1 0 0 1 0 1 2s(2) 1 0 0 1 0 1 2s(2) 1 . 0 0 1 0 1 โq1 24d 1 0 1 0 {q1 = 1} โqโ3d . 1 0 1 0 {qโ3=1}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ s(3) 1 0 0 0 1 s(3) . 1 0 0 0 1 2s(3) 1 0 0 0 1 2s(3) 1 . 0 0 0 1 โq0 24d 1 0 1 0 {q0 = 1} โqโ4d . 1 0 1 0 {qโ4=1}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ s(4) 0 1 1 1 s(4) . 0 1 1 1 s 0 1 1 1 sfrac 0 . 0 0 0 0 0 1 1 1 q 1 0 1 1 qfrac . 1 0 1 1====================== ===================== May 2007 Computer Arithmetic, Division Slide 11 13.2 Programmed Division Fig. 13.3 Register usage for programmed division. Rs Rq Rd 0 0 . . . 0 0 0 0 2 dk Carry Flag Shifted Partial Remainder Shifted Partial Quotient Partial Remainder (2k โ j Bits) Partial Quotient (j Bits) Next quotient digit inserted here Divisor d May 2007 Computer Arithmetic, Division Slide 12 Assembly Language Program for Division Fig. 13.4 Programmed division using left shifts. {Using left shifts, divide unsigned 2k-bit dividend, z_high|z_low, storing the k-bit quotient and remainder. Registers: R0 holds 0 Rc for counter Rd for divisor Rs for z_high & remainder Rq for z_low & quotient} {Load operands into registers Rd, Rs, and Rq} div: load Rd with divisor load Rs with z_high load Rq with z_low {Check for exceptions} branch d_by_0 if Rd = R0 branch d_ovfl if Rs > Rd {Initialize counter} load k into Rc {Begin division loop} d_loop: shift Rq left 1 {zero to LSB, MSB to carry} rotate Rs left 1 {carry to LSB, MSB to carry} skip if carry = 1 branch no_sub if Rs < Rd sub Rd from Rs incr Rq {set quotient digit to 1} no_sub: decr Rc {decrement counter by 1} branch d_loop if Rc โข 0 {Store the quotient and remainder} store Rq into quotient store Rs into remainder d_by_0: ... d_ovfl: ... d_done: ... Rs Rq Rd 0 0 . . . 0 0 0 0 2 dk Carry Flag Shifted Partial Remainder Shifted Partial Quotient Partial Remainder (2k โ j Bits) Partial Quotient (j Bits) Next quotient digit inserted here Divisor d Fig. 13.3 Register usage for programmed division. May 2007 Computer Arithmetic, Division Slide 15 Indirect Signed Division In division with signed operands, q and s are defined by z = d ร q + s sign(s) = sign(z) |s | < |d | Examples of division with signed operands z = 5 d = 3 โ q = 1 s = 2 z = 5 d = โ3 โ q = โ1 s = 2 z = โ5 d = 3 โ q = โ1 s = โ2 z = โ5 d = โ3 โ q = 1 s = โ2 Magnitudes of q and s are unaffected by input signs Signs of q and s are derivable from signs of z and d Will discuss direct signed division later (not q = โ2, s = โ1) May 2007 Computer Arithmetic, Division Slide 16 Example of Restoring Unsigned Division Fig. 13.6 Example of restoring unsigned division. ======================= z 0 1 1 1 0 1 0 1 24d 0 1 0 1 0 โ24d 1 0 1 1 0 ======================= s(0) 0 0 1 1 1 0 1 0 1 2s(0) 0 1 1 1 0 1 0 1 +(โ24d) 1 0 1 1 0 โโโโโโโโโโโโโโโโโโโโโโโโ s(1) 0 0 1 0 0 1 0 1 Positive, so set q3 = 1 2s(1) 0 1 0 0 1 0 1 +(โ24d) 1 0 1 1 0 โโโโโโโโโโโโโโโโโโโโโโโโ s(2) 1 1 1 1 1 0 1 Negative, so set q2 = 0 s(2)=2s(1) 0 1 0 0 1 0 1 and restore 2s(2) 1 0 0 1 0 1 +(โ24d) 1 0 1 1 0 โโโโโโโโโโโโโโโโโโโโโโโโ s(3) 0 1 0 0 0 1 Positive, so set q1 = 1 2s(3) 1 0 0 0 1 +(โ24d) 1 0 1 1 0 โโโโโโโโโโโโโโโโโโโโโโโโ s(4) 0 0 1 1 1 Positive, so set q0 = 1 s 0 1 1 1 q 1 0 1 1 ======================= No overflow, because (0111)two < (1010)two May 2007 Computer Arithmetic, Division Slide 17 13.4 Nonrestoring and Signed Division The cycle time in restoring division must accommodate: Shifting the registers Allowing signals to propagate through the adder Determining and storing the next quotient digit Storing the trial difference, if required Quotient q Mux Adder out c 0 1 Partial remainder s (initial value z) Divisor d Shift Shift Load 1 in c (j) Quotient digit selector q kโj MSB of 2s (jโ1) k k k Trial difference Later events depend on earlier ones in the same cycle, causing a lengthening of the clock cycle Nonrestoring division to the rescue! Assume qkโj = 1 and subtract Store the result as the new PR (the partial remainder can become incorrect, hence the name โnonrestoringโ) May 2007 Computer Arithmetic, Division Slide 20 Graphical Depiction of Nonrestoring Division Fig. 13.8 Partial remainder variations for restoring and nonrestoring division. 300 200 100 0 โ100 117 234 74 148 โ12 296 136 272 112 s (0) s (1) s (2) s (3) s =16s (4) โ160 2 ร 2 ร 2 ร ร 2 โ160 โ160 โ160 P ar tia l r em ai nd er (a) Restoring 148 300 200 100 0 โ100 117 234 74 148 โ12 โ24 136 272 112 s (0) s (1) s (2) s (3) s =16s (4) โ160 2 ร 2 ร 2 ร ร 2 โ160 +160 โ160 P ar tia l r em ai nd er (b) Nonrestoring Example (0 1 1 1 0 1 0 1)two / (1 0 1 0)two (117)ten / (10)ten May 2007 Computer Arithmetic, Division Slide 21 Nonrestoring Division with Signed Operands Restoring division qkโj = 0 means no subtraction (or subtraction of 0) qkโj = 1 means subtraction of d Nonrestoring division We always subtract or add It is as if quotient digits are selected from the set {1, โ1}: 1 corresponds to subtraction โ1 corresponds to addition Our goal is to end up with a remainder that matches the sign of the dividend This idea of trying to match the sign of s with the sign z, leads to a direct signed division algorithm if sign(s) = sign(d) then qkโj = 1 else qkโj = โ1 Example: q = . . . 0 0 0 1 . . . . . . 1 โ1 โ1 โ1 . . . May 2007 Computer Arithmetic, Division Slide 22 Quotient Conversion and Final Correction Partial remainder variation and selected quotient digits during nonrestoring division with d > 0 d 0 โd +d โd โd โd +d +d ร2 ร2 ร2 ร2 ร2 โ1 1 โ1 โ1 1 1 z 0 1 0 0 1 1 1 1 0 0 1 1 1 Quotient with digits โ1 and 1 Check: โ32 + 16 โ 8 โ 4 + 2 + 1 = โ25 = โ64 + 32 + 4 + 2 + 1 Replace โ1s with 0s Final correction step if sign(s) โ sign(z): Add d to, or subtract d from, s; subtract 1 from, or add 1 to, q Shift left, complement MSB, and set LSB to 1 to get the 2โs-complement quotient 1 1 0 1 0 0 0 May 2007 Computer Arithmetic, Division Slide 25 13.5 Division by Constants Software and hardware aspects: As was the case for multiplications by constants, optimizing compilers may replace some divisions by shifts/adds/subs; likewise, in custom VLSI circuits, hardware dividers may be replaced by simpler adders Method 1: Find the reciprocal of the constant and multiply (particularly efficient if several numbers must be divided by the same divisor) Method 2: Use the property that for each odd integer d, there exists an odd integer m such that d ร m = 2n โ 1; hence, d = (2n โ 1)/m and Number of shift-adds required is proportional to log k Multiplication by constant Shift-adds L)21)(21)(21( 2)21(212 42 nnn nnnn zmzmzm d z โโโ โ +++=โ = โ = May 2007 Computer Arithmetic, Division Slide 26 Example Division by a Constant L)21)(21)(21( 2)21(212 42 nnn nnnn zmzmzm d z โโโ โ +++=โ = โ = Example: Dividing the number z by 5, assuming 24 bits of precision. We have d = 5, m = 3, n = 4; 5 ร 3 = 24 โ 1 Instruction sequence for division by 5 q โ z + z shift-left 1 {3z computed} q โ q + q shift-right 4 {3z(1+2โ4) computed} q โ q + q shift-right 8 {3z(1+2โ4)(1+2โ8) computed} q โ q + q shift-right 16 {3z(1+2โ4)(1+2โ8)(1+2โ16) computed} q โ q shift-right 4 {3z(1+2โ4)(1+2โ8)(1+2โ16)/16 computed} L)21)(21)(21( 16 3 )21(2 3 12 3 5 1684 444 โโโ โ +++=โ = โ = zzzz 5 shifts 4 adds May 2007 Computer Arithmetic, Division Slide 27 13.6 Preview of Fast Dividers Like multiplication, division is multioperand addition Thus, there are but two ways to speed it up: a. Reducing the number of operands (divide in a higher radix) b. Adding them faster (keep partial remainder in carry-save form) a x p 2 x a 0 0 1 x a 2 1 x a 2 2 2 2 3 3 x a ร (a) k ร k integer multiplication z s q Divisor d q d 2 3 3 โ q d 2 2 2 โ q d 2 1 1 โ q d 2 0 0 โ (b) 2k / k integer division Fig. 13.11 (a) Multiplication and (b) division as multioperand addition problems. There is one complication that makes division inherently more difficult: The terms to be subtracted from (added to) the dividend are not known a priori but become known as quotient digits are computed; quotient digits in turn depend on partial remainders May 2007 Computer Arithmetic, Division Slide 30 14.1 Basics of High-Radix Division Division with left shifts s(j) = rs(jโ1) โ qkโj (r k d) with s(0) = z and |โshiftโ| s(k) = r ks |โโโsubtractโโโ| Radices of practical interest are powers of 2, and perhaps 10 Dividend z s Remainder Quotient q Divisor d (q q ) d 4 1 3 โ 2 two 4 0 d (q q ) 1 โ 0 two Fig. 14.1 Radix-4 division in dot notation k digits k digits rz qkโj rk d 0 May 2007 Computer Arithmetic, Division Slide 31 Difficulty of Quotient Digit Selection What is the first quotient digit in the following radix-10 division? _____________ 2 0 4 3 | 1 2 2 5 7 9 6 8 The problem with the pencil-and-paper division algorithm is that there is no room for error in choosing the next quotient digit In the worst case, all k digits of the divisor and k + 1 digits in the partial remainder are needed to make a correct choice 12 / 2 = 6 122 / 20 = 6 1225 / 204 = 6 12257 / 2043 = 5 Suppose we used the redundant signed digit set [โ9, 9] in radix 10 Then, we could choose 6 as the next quotient digit, knowing that we can recover from an incorrect choice by using negative digits: 5 9 = 6 -1 May 2007 Computer Arithmetic, Division Slide 32 Examples of High-Radix Division Radix-4 integer division Radix-10 fractional division ====================== ================= z 0 1 2 3 1 1 2 3 zfrac . 7 0 0 3 44d 1 2 0 3 dfrac . 9 9 ====================== ================= s(0) 0 1 2 3 1 1 2 3 s(0) . 7 0 0 3 4s(0) 0 1 2 3 1 1 2 3 10s(0) 7 . 0 0 3 โq3 44d 0 1 2 0 3 {q3 = 1} โqโ1d 6 . 9 3 {qโ1 = 7}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ s(1) 0 0 2 2 1 2 3 s(1) . 0 7 3 4s(1) 0 0 2 2 1 2 3 10s(1) 0 . 7 3 โq2 44d 0 0 0 0 0 {q2 = 0} โqโ2d 0 . 0 0 {qโ2 = 0}โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ s(2) 0 2 2 1 2 3 s(2) . 7 3 4s(2) 0 2 2 1 2 3 sfrac . 0 0 7 3 โq1 44d 0 1 2 0 3 {q1 = 1} qfrac . 7 0โโโโโโโโโโโโโโโโโโโโโโโ ================= s(3) 1 0 0 3 3 4s(3) 1 0 0 3 3 โq0 44d 0 3 0 1 2 {q0 = 2}โโโโโโโโโโโโโโโโโโโโโโโ s(4) 1 0 2 1 s 1 0 2 1 q 1 0 1 2 ====================== Fig. 14.2 Examples of high-radix division with integer and fractional operands. May 2007 Computer Arithmetic, Division Slide 35 โ2d 2d d โd q =โ1 q =0 q =1 2s (jโ1) s (j) โj โj โj d โd โ1/2 1/2 โ1 1 โ1/2 1/2 The Radix-2 SRT Division Algorithm Fig. 14.5 The relationship between new and old partial remainders in radix-2 SRT division. We use the comparison constants โยฝ and ยฝ for quotient digit selection 2s โฅ +ยฝ means 2s = (0.1xxxxxxxx)2โs-compl 2s < โยฝ means 2s = (1.0xxxxxxxx)2โs-compl s(j) = 2s(jโ1) โ qโj d with s(0) = z s(k) = 2ks s(j) โ [โยฝ, ยฝ) qโj โ {โ1, 0, 1} May 2007 Computer Arithmetic, Division Slide 36 Radix-2 SRT Division with Variable Shifts We use the comparison constants โยฝ and ยฝ for quotient digit selection For 2s โฅ +ยฝ or 2s = (0.1xxxxxxxx)2โs-compl choose qโj = 1 For 2s < โยฝ or 2s = (1.0xxxxxxxx)2โs-compl choose qโj = โ1 Choose qโj = 0 in other cases, that is, for: 0 โค 2s < +ยฝ or 2s = (0.0xxxxxxxx)2โs-compl โยฝ โค 2s < 0 or 2s = (1.1xxxxxxxx)2โs-compl Observation: What happens when the magnitude of 2s is fairly small? 2s = (0.00001xxxx)2โs-compl 2s = (1.1110xxxxx)2โs-compl Choosing qโj = 0 would lead to the same condition in the next step; generate 5 quotient digits 0 0 0 0 1 Generate 4 quotient digits 0 0 0 โ1 Use leading 0s or leading 1s detection circuit to determine how many quotient digits can be spewed out at once Statistically, the average skipping distance will be 2.67 bits May 2007 Computer Arithmetic, Division Slide 37 Example Unsigned Radix-2 SRT Division Fig. 14.6 Example of unsigned radix-2 SRT division. ======================== z . 0 1 0 0 0 1 0 1 d 0 . 1 0 1 0 โd 1 . 0 1 1 0 ======================== s(0) 0 . 0 1 0 0 0 1 0 1 2s(0) 0 . 1 0 0 0 1 0 1 โฅ ยฝ, so set qโ1 = 1 +(โd) 1 . 0 1 1 0 and subtract โโโโโโโโโโโโโโโโโโโโโโโโ s(1) 1 . 1 1 1 0 1 0 1 2s(1) 1 . 1 1 0 1 0 1 In [โยฝ, ยฝ), so set qโ2 = 0โโโโโโโโโโโโโโโโโโโโโโโโ s(2) =2s(1) 1 . 1 1 0 1 0 1 2s(2) 1 . 1 0 1 0 1 In [โยฝ, ยฝ), so set qโ3 = 0โโโโโโโโโโโโโโโโโโโโโโโโ s(3) =2s(2) 0 . 1 0 1 0 1 2s(3) 1 . 0 1 0 1 < โยฝ, so set qโ4 = โ1 +d 0 . 1 0 1 0 and add โโโโโโโโโโโโโโโโโโโโโโโโ s(4) 1 . 1 1 1 1 Negative, +d 0 . 1 0 1 0 so add to correct โโโโโโโโโโโโโโโโโโโโโโโโ s(4) 0 . 1 0 0 1 s 0 . 0 0 0 0 0 1 0 1 q 0 . 1 0 0โ1 Uncorrected BSD quotient q 0 . 0 1 1 0 Convert and subtract ulp ======================== In [โยฝ, ยฝ), so okay May 2007 Computer Arithmetic, Division Slide 40 Divider with Partial Remainder in Carry-Save Form Fig. 14.8 Block diagram of a radix-2 divider with partial remainder in stored-carry form. Carry v Mux Adder 0 1 Divisor d k k Carry-save adder Select q โj 4 bits Shift left 2s +ulp for 2โs compl Sum u Non0 (enable) Sign (select) 0, d, or dโ Carry Sum May 2007 Computer Arithmetic, Division Slide 41 Why We Cannot Use Carry-Save PR with SRT Division Fig. 14.9 Overlap regions in radix-2 SRT division. โ2d 2d d โd q =โ1 q =0 q =1 2s (jโ1) s (j) โj โj โj d โd 1 โ d โ1 1 โ1/2 1/2 1 โ d May 2007 Computer Arithmetic, Division Slide 42 14.4 Choosing the Quotient Digits Fig. 14.10 A p-d plot for radix-2 division with d โ [1/2,1), partial remainder in [โd, d), and quotient digits in [โ1, 1]. d p Infeasible region (p cannot be โฅ 2d) Infeasible region (p cannot be < โ2d) .100 .101 .110 .111 1. 00.1 00.0 11.1 10.0 10.1 11.0 01.1 01.0 โ00.1 โ01.0 โ01.1 โ10.0 d 2d โ2d โd Worst-case error margin in comparison Choose 1 Choose โ1 Choose 0 โ1 1 โ1 max โ1 min 1 min 1 max 0 max 0 min O ve rla p O ve rla p 0 Fig. 14.7 โ2d 2d d โd q =โ1 q =0 q =1 2s (jโ1) s (j) โj โj โj dโd โ1/2 0 Choose โ1 Choose 0 Choose 1 โ1/0 0/+1 Overlap Overlap May 2007 Computer Arithmetic, Division Slide 45 Building the p-d Plot for Radix-4 Division Fig. 14.12 A p-d plot for radix-4 SRT division with quotient digit set [โ3, 3]. d p Infeasible region (p cannot be โฅ 4d) .100 .101 .110 .111 10.1 10.0 01.1 00.0 00.1 01.0 11.1 11.0 d 2d Choose 2 Choose 0 Choose 1 3 1 2 max 2 min 1 min 1 max 0 max O ve rla p 0 3d 4d Choose 3 3 min 2 O ve rla p O ve rla p Uncertainty region Uncertainty region May 2007 Computer Arithmetic, Division Slide 46 โ4d 4d d โd 4s (jโ1) โ3 โ2 โ1 0 +1 +2 +3 s (j) 2d/3 8d/3 โ2d/3 โ8d/3 Restricting the Quotient Digit Set in Radix 4 Fig. 14.13 New versus shifted old partial remainder in radix-4 division with qโj in [โ2, 2]. Radix-4 fractional division with left shifts and qโj โ [โ2, 2] s(j) = 4s(jโ1) โ qโj d with s(0) = z and s(k) = 4ks |โshiftโ| |โโsubtractโโ| For this restriction to be feasible, we must have: s โ [โhd, hd) for some h < 1, and 4hd โ 2d โค hd This yields h โค 2/3 (choose h = 2/3 to minimize the restriction) May 2007 Computer Arithmetic, Division Slide 47 d p .100 .101 .110 .111 10.1 10.0 01.1 00.0 00.1 01.0 11.1 11.0 Choose 2 Choose 0 Choose 1 1 2 min 1 min 2 max 1 max 0 max 0 2 O ve rla p O ve rla p Infeasible region (p cannot be โฅ 8d/3) 8d/3 5d/3 4d/3 2d/3 d/3 Building the p-d Plot with Restricted Radix-4 Digit Set Fig. 14.14 A p-d plot for radix-4 SRT division with quotient digit set [โ2, 2]. May 2007 Computer Arithmetic, Division Slide 50 Variations in Dividers: Topics Topics in This Chapter 15.1. Quotient Digit Selection Revisited 15.2. Using p-d Plots in Practice 15.3. Division with Prescaling 15.4. Modular Dividers and Reducers 15.5. Array Dividers 15.6. Combined Multiply/Divide Units May 2007 Computer Arithmetic, Division Slide 51 15.1 Quotient Digit Selection Revisited Radix-r division with quotient digit set [โฮฑ, ฮฑ], ฮฑ < r โ 1 Restrict the partial remainder range, say to [โhd, hd) From the solid rectangle in Fig. 15.1, we get rhd โ ฮฑd โค hd or h โค ฮฑ/(r โ 1) To minimize the range restriction, we choose h = ฮฑ/(r โ 1) Fig. 15.1 The relationship between new and shifted old partial remainders in radix-r division with quotient digits in [โฮฑ, +ฮฑ]. โฮฑ r s (jโ1) s (j) rโ1 rhd โrhd hd โhd d โd โr+1 ฮฑ โ1 1 0 rd โrd ฮฑd โฮฑd d โd 0 May 2007 Computer Arithmetic, Division Slide 52 Why Using Truncated p and d Values Is Acceptable Fig. 15.2 A part of p-d plot showing the overlap region for choosing the quotient digit value ฮฒ or ฮฒ+1 in radix-r division with quotient digit set [โฮฑ, ฮฑ]. p d Choose ฮฒ + 1 Choose ฮฒ d min Overlap region (h + ฮฒ + 1)d A (h + ฮฒ)d (โh + ฮฒ + 1)d (โh + ฮฒ)d B 4 bits of p 3 bits of d 3 bits of p 4 bits of d Note: h = ฮฑ / (r โ 1) Standard p xx.xxxx Carry-save p xx.xxxxx xx.xxxxx May 2007 Computer Arithmetic, Division Slide 55 Example: Lower Bounds on Precision )12(min โ=ฮ hdp Fig. 15.4 ฮp p d Choose ฮฑ Choose ฮฑ โ 1 d min Overlap region (h + ฮฑ โ 1)d (โh + ฮฑ)d ฮd d min ฮd + (h + ฮฑ โ 1) d min (โh + ฮฑ) d min For r = 4, divisor range [0.5, 1), digit set [โ2, 2], we have ฮฑ = 2, dmin = 1/2, h = ฮฑ/(r โ 1) = 2/3 Because 1/8 = 2โ3 and 2โ3 โค 1/6 < 2โ2, we must inspect at least 3 bits of d (2, given its leading 1) and 3 bits of p These are lower bounds and may prove inadequate In fact, 3 bits of p and 4 (3) bits of d are required With p in carry-save form, 4 bits of each component must be inspected 8/1 23/2 13/4)2/1( = +โ โ =ฮd 6/1)13/4)(2/1( =โ=ฮp ฮฑ+โ โ =ฮ h hdd 12min May 2007 Computer Arithmetic, Division Slide 56 Upper Bounds for Precision Theorem: Once lower bounds on precision are determined based on ฮd and ฮp, one more bit of precision in each direction is always adequate u v ฮp p d w Choose a Choose a โ 1 d min Overlap region w (a โ 1 + h)d (a โ h)d ฮd A B Proof: Let w be the spacing of vertical grid lines w โค ฮd/2 โ v โค ฮp/2 โ u โฅ ฮp/2 May 2007 Computer Arithmetic, Division Slide 57 Some Implementation Details Fig. 15.5 The asymmetry of quotient digit selection process. p d Choose ฮฒ + 1 Choose ฮฒ d min A B d max โฮฒ ฮฒ + 1 Choose โฮฒ + 1 Choose โฮฒ p d ฮฒ +1 ฮฒ ฮฒ ฮฒ ฮฒ ฮฒ ฮฒ ฮด ฮฒ ฮฒ+1 ฮฒ+1 ฮฒ+1 ฮฒ+1 ฮฒ+1 ฮฒ+1 or ฮด+1 ฮด * * * * Fig. 15.6 Example of p-d plot allowing larger uncertainty rectangles, if the 4 cases marked with asterisks are handled as exceptions. May 2007 Computer Arithmetic, Division Slide 60 15.4 Modular Dividers and Reducers Given dividend z and divisor d, with d โฅ 0, a modular divider computes q = โฃz / dโฆ and s = z mod d = โฉzโชd The quotient q is, by definition, an integer but the inputs z and d do not have to be integers; the modular remainder is always positive Example: โฃโ3.76 / 1.23โฆ = โ4 and โฉโ3.76โช1.23 = 1.16 The quotient and remainder of ordinary division are โ3 and โ0.07 A modular reducer computes only the modular remainder and is in many cases simpler than a full-blown divider May 2007 Computer Arithmetic, Division Slide 61 15.5 Array Dividers Fig. 15.7 Restoring array divider composed of controlled subtractor cells. z z โ5 โ6 s s s โ4 โ5 โ6 q q q โ1 โ2 โ3 FS Cell z z z zโ1 โ2 โ3 โ4 1 0 d d d โ1 โ2 โ3 0 0 0 โ1 โ2 โ3 โ4 โ5 โ6 โ1 โ2 โ3 โ1 โ2 โ3 โ4 โ5 โ6 Dividend z = .z z z z z z Divisor d = .d d d Quotient q = .q q q Remainder s = .0 0 0 s s s May 2007 Computer Arithmetic, Division Slide 62 Nonrestoring Array Divider Fig. 15.8 Nonrestoring array divider built of controlled add/subtract cells. Dividend z = z .z z z z z z Divisor d = d .d d d Quotient q = q .q q q Remainder s = 0 .0 0 s s s s 0 โ1 โ2 โ3 โ4 โ5 โ6 0 โ1 โ2 โ3 0 โ1 โ2 โ3 โ3 โ4 โ5 โ6 z z z โ4 โ5 โ6 s s s sโ3 โ4 โ5 โ6 q q q 0 โ1 โ2 q โ3 d d d d0 โ1 โ2 โ3 z z z z0 โ1 โ2 โ3 FA XOR Cell 1 Similarity to array multiplier is deceiving Critical path May 2007 Computer Arithmetic, Division Slide 65 Single Unit for Sequential Multiplication and Division The control unit proceeds through necessary steps for multiplication or division (including using the appropriate shift direction) Fig. 15.9 Sequential radix-2 multiply/divide unit. Multiplier x or quotient q Mux Adder out c 0 1 Partial product p or partial remainder s Multiplicand a or divisor d Shift control Shift Enable in c q kโj MSB of 2s (jโ1) k k k j x MSB of p (j+1) Divisor sign Multiply/ divide control Select Mul Div The slight speed penalty owing to a more complex control unit is insignificant May 2007 Computer Arithmetic, Division Slide 66 Single Unit for Array Multiplication and Division Each cell within the array can act as a modified adder or modified subtractor based on control input values Fig. 15.10 I/O specification of a universal circuit that can act as an array multiplier or array divider. In some designs, squaring and square-rooting functions are also included within the same array Multiplicand or divisor Multiplier Product or remainder Quotient Mul/Div Additive input or dividend May 2007 Computer Arithmetic, Division Slide 67 16 Division by Convergence Chapter Goals Show how by using multiplication as the basic operation in each division step, the number of iterations can be reduced Chapter Highlights Digit-recurrence as convergence method Convergence by Newton-Raphson iteration Computing the reciprocal of a number Hardware implementation and fine tuning May 2007 Computer Arithmetic, Division Slide 70 16.2 Division by Repeated Multiplications Remainder often not needed, but can be obtained by another multiplication if desired: s = z โ qd Motivation: Suppose add takes 1 clock and multiply 3 clocks 64-bit divide takes 64 clocks in radix 2, 32 in radix 4 Divide faster via multiplications faster if 10 or fewer needed )1()1()0( )1()1()0( โ โ == m m xxdx xxzx d zq L L Idea: Force to 1 Converges to q To turn the identity into a division algorithm, we face three questions: 1. How to select the multipliers x(i) ? 2. How many iterations (pairs of multiplications)? 3. How to implement in hardware? May 2007 Computer Arithmetic, Division Slide 71 Formulation as a Convergence Computation )1()1()0( )1()1()0( โ โ == m m xxdx xxzx d zq L L Idea: Force to 1 Converges to q d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q โ
z (m) Question 1: How to select the multipliers x (i) ? x (i) = 2 โ d (i) This choice transforms the recurrence equations into: d (i+1) = d (i) (2 โ d (i)) Set d (0) = d; iterate until d (m) โ
1 z (i+1) = z (i) (2 โ d (i)) Set z (0) = z; obtain z/d = q โ
z (m) u (i+1) = f(u (i), v (i)) v (i+1) = g(u (i), v (i)) Fits the general form May 2007 Computer Arithmetic, Division Slide 72 Determining the Rate of Convergence d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q โ
z (m) Question 2: How quickly does d (i) converge to 1? We can relate the error in step i + 1 to the error in step i: d (i+1) = d (i) (2 โ d (i)) = 1 โ (1 โ d (i))2 1 โ d (i+1) = (1 โ d (i))2 For 1 โ d (i) โค ฮต, we get 1 โ d (i+1) โค ฮต2: Quadratic convergence In general, for k-bit operands, we need 2m โ 1 multiplications and m 2โs complementations where m = โกlog2 kโค May 2007 Computer Arithmetic, Division Slide 75 16.3 Division by Reciprocation Fig. 16.2 Convergence to a root of f(x) = 0 in the Newton-Raphson method. The Newton-Raphson method can be used for finding a root of f (x) = 0 f(x) xx(i+1)x f(x ) Tangent at x(i) Root ฮฑ x (i)(i+2) (i) (i) Start with an initial estimate x(0) for the root Iteratively refine the estimate via the recurrence x(i+1) = x(i) โ f (x(i)) / f โฒ(x(i)) Justification: tan ฮฑ(i) = f โฒ(x(i)) = f (x(i)) / (x(i) โ x(i+1)) May 2007 Computer Arithmetic, Division Slide 76 Computing 1/d by Convergence 1/d is the root of f (x) = 1/x โ d f โฒ(x) = โ1/x2 Substitute in the Newton-Raphson recurrence x(i+1) = x(i) โ f (x(i)) / f โฒ(x(i)) to get: x (i+1) = x (i) (2 โ x (i)d) One iteration = Two multiplications + One 2โs complementation Error analysis: Let ฮด (i) = 1/d โ x(i) be the error at the ith iteration ฮด (i+1) = 1/d โ x (i+1) = 1/d โ x (i) (2 โ x (i) d) = d (1/d โ x (i))2 = d (ฮด (i))2 Because d < 1, we have ฮด (i+1) < (ฮด (i))2 โd 1/d x f(x) May 2007 Computer Arithmetic, Division Slide 77 Choosing the Initial Approximation to 1/d With x(0) in the range 0 < x(0) < 2/d, convergence is guaranteed Justification: |ฮด(0) | = |x(0) โ 1/d | < 1/d ฮด(1)= |x(1) โ 1/d | = d (ฮด(0))2 = (dฮด(0))ฮด(0) < ฮด(0) 1 x 1/x 2 1 0 0 For d in [1/2, 1): Simple choice x(0) = 1.5 Max error = 0.5 < 1/d Better approx. x(0) = 4(โ3 โ 1) โ 2d = 2.9282 โ 2d Max error โ
0.1 May 2007 Computer Arithmetic, Division Slide 80 Visualizing the Convergence with Table Lookup Fig. 16.3 Convergence in division by repeated multiplications with initial table lookup. 1 1 โ ulp d z q โ Iterations After table lookup and 1st pair of multiplications, replacing several iterations After the 2nd pair of multiplications ฮต May 2007 Computer Arithmetic, Division Slide 81 Convergence Does Not Have to Be from Below Fig. 16.4 Convergence in division by repeated multiplications with initial table lookup and the use of truncated multiplicative factors. 1 1 ยฑ ulp d z q ยฑ Iterations ฮต May 2007 Computer Arithmetic, Division Slide 82 Using Truncated Multiplicative Factors Fig. 16.4 One step in convergence division with truncated multiplicative factors. 1 Approximate iteration Precise iteration B A i + 1 i Iteration (x (i+1) d x (0) x (1) x (i) ... x (i+1) ) T d x (0) x (1) x (i) ... d x (0) x (1) x (i) ... < 2 โa Example (64-bit multiplication) Initial step: Table of size 256 ร 8 = 2K bits Middle steps: Multiplication pairs, with 9-, 17-, and 33-bit multipliers Final step: Full 64 ร 64 multiplication Problem 16.9a A truncated denominator d (i), with a identical leading bits and b extra bits (b โค a), leads to a new denominator d (i+1) with a + b identical leading bits May 2007 Computer Arithmetic, Division Slide 85 16.6 Analysis of Lookup Table Size Table 16.2 Sample entries in the lookup table replacing the first four multiplications in division by repeated multiplications โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Address d = 0.1 xxxx xxxx x (0+) = 1. xxxx xxxx โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 55 0011 0111 1010 0101 64 0100 0000 1001 1001 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Example: Table entry at address 55 (311/512 โค d < 312/512) For 8 bits of convergence, the table entry f must satisfy (311/512)(1 + . f) โฅ 1 โ 2โ8 (312/512)(1 + . f) โค 1 + 2โ8 199/311 โค .f โค 101/156 or 163.81 โค 256 ร . f โค 165.74 Two choices: 164 = (1010 0100)two or 165 = (1010 0101)two May 2007 Computer Arithmetic, Division Slide 86 A General Result for Table Size Proof strategy for sufficiency: Represent the table entry 1.f as the integer v = 2w ร .f and derive upper / lower bound expressions for it. Then, show that at least one integer exists between vlb and vub Theorem 16.1: To get w โฅ 5 bits of convergence after the first iteration of division by repeated multiplications, w bits of d (beyond the mandatory 1) must be inspected. The factor x(0+) read out from table is of the form (1.xxx . . . xxx)two, with w bits after the radix point Proof strategy for necessity: Show that derived conditions cannot be met if the table is of size 2kโ1 (no matter how wide) or if it is of width k โ 1 (no matter how large) Excluded cases, w < 5: Practically uninteresting (allow smaller table) General radix r : Same analysis method, and results, apply