Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Architecture Midterm Solutions: Instruction Set Architecture and Registers Sets, Exams of Computer Architecture and Organization

Solutions to the midterm exam questions related to computer architecture, specifically instruction set architecture and registers sets. It includes explanations and calculations for various instruction formats and their consequences, as well as examples and sequences of instructions to replace old formats.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3
shashikanth_0p3 🇮🇳

4.8

(8)

57 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Computer Architecture Midterm Solutions: Instruction Set Architecture and Registers Sets and more Exams Computer Architecture and Organization in PDF only on Docsity! 1 University of California at Berkeley College of Engineering Computer Science Division { EECS CS 152 D. Patterson & R. Yung Fall 1995 Computer Architecture and Engineering Midterm I Solutions Question #1: Technology and performance [20 pts] a) Calculate the average execution time for each instruction with an in nitely fast memory. Which is faster and by what factor? Show your work. [6 pts] InstructionTime GaAS = CPI Clock rate = 2:5 1000MHz = 2.5 nanoseconds InstructionTime CMOS = CPIClock rate = 0:75 200MHz = 3.75 nanoseconds InstructionTime CMOS InstructionTime GaAS = 3:75 2:5 = 1:5 So, the GaAS microprocessor is 1.5 times faster than a CMOS microprocessor. Grading : 4 points for showing your work. 1 point for having correct instruction execution times. 1 point for having correct performance factor. b) How many seconds will each CPU take to execute a 1 billion instruction program? [3 pts] Execution Time of Program = (number of instructions)  (avg inst exec. time) Execution Time of Program on GaAS = (1 109) (2:5 109) = 2.5 seconds Execution Time of Program on CMOS = (1 109) (3:75 109) = 3.75 seconds Grading : 1 point for showing your work. 2 points for having correct program execution times. c) What is the cost of an untested GaAs die for this CPU? Repeat the calculation for a CMOS die. Show your work. [7 pts] dies/wafer =  (wafer diameter=2)2 die area  wafer diameterp 2 die area test dies per wafer  die yield = wafer yield   1 + defects per unit area  die area  cost of die = cost of waferbdies per waferdie yieldc dies/wafer GaAS = j (10=2)2 1 10p21 4 k = 52 die yield GaAS = 0:8 1 + 412 2 = 0:088 2 cost of die GaAS = $2000b520:088c = $500 dies/wafer CMOS = j (20=2)2 2 20p22 4 k = 121 die yield CMOS = 0:9 1 + 122 2 = 0:225 cost of die CMOS = $1000b1210:225c = $37:04 Grading : 1 point for correctly using oor functions in formulas. 1 point deducted if formulas used are good, but computed values are wrong. Rest of the points on showing your work. d) What is the ratio of the cost of the GaAs die to the cost of the CMOS die? [1 pt] Cost of GaAS die Cost of CMOS die = $500:00 $37:04 = 13:5 Grading : 1 point for using an equation which sets up the ratio correctly. e) Based on the costs and performance ratios of the CPU calculated above, what is the ratio of cost/performance of the CMOS CPU to the GaAs CPU? [3 pts] performance = 1execution time cost CMOS perf. CMOS cost GaAS perf. GaAS = cost CMOSexec time CMOScost GaASexec time GaAS = $37:043:75ns $500:002:5ns = 0:111 Grading : 2 points for using the correct formula. 1 point for correct value. 5 Question #3: Left Shift vs. Multiply [20 pts] Design a 16-bit left shifter that shifts 0 to 15 bits using only 4:1 multiplexors. a) How many levels of 4:1 multiplexors are needed? Show your work. [4 pts] This question only makes sense if you read it as asking for the minimum number of multiplexors needed for the operation. It can clearly be done with arbitrarily more. The minimum number of levels is log4 16 = 2 Grading: For an answer of 2 levels: 2 pts. For "showing your work," either by writing the formula or answering #3 correctly : 2pts. An answer of 3 levels, if supported by reasonable logic here or in #3 : 2pts. b) If the delay per multiplexor is 2ns, what is the speed of this shifter? (Assume zero delay for the wires.) [3 pts] 2 levels  2ns/level = 4ns. Grading: 1 pt if you can multiply correctly. 2 pts for the correct answer. c) Draw the four leftmost bits of the multiplexors with the proper connections. (You might want to practice drawing it on the back of a page, and then transfer the nal version here.) [6 pts] The simplest solution here was to draw each output O15 ... O12 separately, each using the shift amount bit pattern as the control, ie: i0 ---|\ 0 ---|\ i1 ---| | i0 ---| | i2 ---| |---. i1 ---| |---. i3 ---|/ | i2 ---|/ | | | i4 ---|\ | i3 ---|\ | i5 ---| | `-|\ i4 ---| | `-|\ i6 ---| |--. | | i5 ---| |--. | | i7 ---|/ `--| |-- o15 i6 ---|/ `--| |-- o14 ,---| | ,---| | i8 ---|\ | | | i7 ---|\ | | | i9 ---| | | ,-|/ i8 ---| | | ,-|/ i10 --| |-' | i9 ---| |-' | i11 --|/ | i10 --|/ | | | i12 --|\ | i11 --|\ | i13 --| | | i12 --| | | i14 --| |---' i13 --| |---' i15 --|/ i14 --|/ 6 0 ---|\ 0 ---|\ 0 ---| | 0 ---| | i0 ---| |---. 0 ---| |---. i1 ---|/ | i0 ---|/ | | | i2 ---|\ | i1 ---|\ | i3 ---| | `-|\ i2 ---| | `-|\ i4 ---| |--. | | i3 ---| |--. | | i5 ---|/ `--| |-- o13 i4 ---|/ `--| |-- o12 ,---| | ,---| | i6 ---|\ | | | i5 ---|\ | | | i7 ---| | | ,-|/ i6 ---| | | ,-|/ i8 ---| |-' | i7 ---| |-' | i9 ---|/ | i8 ---|/ | | | i10 --|\ | i9 ---|\ | i11 --| | | i10 --| | | i12 --| |---' i11 --| |---' i13 --|/ i12 --|/ The control signals to the rst level muxes are s3s2, and the control signals to the second level muxes are s1s0, where s is the shift amount bit pattern. Grading: We traced three arbitrary paths through the presented array. If all three worked, and all in- puts/outputs were labeled: 6 pts. Leaving o the input/output labeling resulting in a loss of from 1-3 points, depending on the severity. From 1-2 points was taken o for improper or no labeling of the control inputs, again depending on their complexity and how severe the omissions were. Partial credit was awarded for partially working solutions. d) Multiplies can be accomplished by left shifts if the multiplier is a power of two. Write the MIPS instruction(s) that performs multiply via left shifts for a multiplier that is positive and a power of two. Assume the multiplicand is in register $4, the multiplier is in register $5, and that the least signi cant 32 bits of the product should be left in $6. You can ignore the potential for over ow in the result register. [7 pts] A good solution has four instructions in the loop, a single branch instruction, and avoids a delayed branch stall and/or error: LI $C, -1 ANDI $T, $5, 1 LOOP: ADDI $C, $C, 1 SHR $5, $5, 1 BEQZ $T, LOOP ANDI $T, $5, 1 ; Delay slot, taken from top of target EXIT: SHL $6, $4, $C Grading: Start with full credit for nominally working code. Then, -1 pt for the delay slot being wasted or doing useless work (i.e. the SHL). -2 pts for delay slot causing improper execution. -2 pts for using div rather than shr. -2 pts for using mult rather than shl somewhere. -1 to -2 pts for failing on certain cases, such as when $5 = 1. 2 pts were given to solutions that assumed that $5 contained the log of the multiplier and implemented a single instruction. -1 pt for illegal MIPS instructions, -1 pt for wrong MIPS instructions. Just implementing the hardware multiply algorithm from class in software was worth 2 pts. No points were deducted for checking if the multiplier was zero, although zero is not a power of two. 7 extra credit) In class, we've seen three versions of the multiply operation (ignoring Booth encoding). Approximately how many clock cycles does a multiply take using the fastest algorithm? Approximately how much faster are multiplies that use the shifting technique (assuming the multiplier is a power of two)? [+4 pts] Original multiply takes from 1  32 = 32 to 2  32 = 64 clock cycles (just answering one or the other was accepted). Most common mistake here was forgetting that it is an algorithm in hardware, not a MIPS program doing the same thing. The shifting technique takes a variable amount of time, based on the location of the 1 in the multiplier. Total clock cycles also depends on the number of instructions in the loop. For the code above, number of clock cycles = 3 + 4 (size of multiplier). Assuming the multiplier is 2i and i ranges form 0 to 15, and that i is randomly distributed (not necessarily true!), the minimum, average, and maximum time to execute is 3+ 4 = 7, 3+4 16 = 67, and 3+4 32 = 131 clock cycles. Thus, the shifting algorithm is faster when the size of the multipiler is less than or equal to 15 bits (32768) for the 64 clock original multiply, or less than or equal to 7 bits (128) in the 32 clock cycle original. Grading: +1 pt for the original multiply cycle time. +1 pt for the shift method's cycle time. +2 pts for realizing the performance is variable based on the multiplier, and analyzing the performance in accordance. +3 pt max for an otherwise correct solution with a minor error.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved