Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS433g: Computer System Organization - Fall 2005, Practice Problem Set 2 Solutions, Assignments of Computer Architecture and Organization

Solutions to practice problem set 2 of the computer system organization course offered by cs433g in fall 2005. The solutions cover various topics such as pipeline design, instruction execution, and data hazards.

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-hb7
koofers-user-hb7 🇺🇸

10 documents

1 / 5

Toggle sidebar

Related documents


Partial preview of the text

Download CS433g: Computer System Organization - Fall 2005, Practice Problem Set 2 Solutions and more Assignments Computer Architecture and Organization in PDF only on Docsity! CS433g: Computer System Organization – Fall 2005 Practice Problem Set 2 - Solution Question 1 Consider two different machines. The first has a single cycle datapath (i.e., a single stage, non- pipelined machine) with a cycle time of 4ns. The second is a pipelined machine with four pipeline stages and a cycle time of 1ns. a. What is the speedup of the pipelined machine versus the single cycle machine assuming there are no stalls? Solution: The speedup is 4ns/1ns = 4. b. What is the speedup of the pipelined machine versus the single cycle machine if the pipeline stalls 1 cycle for 30% of the instructions? Solution: Since the pipeline loses 1 cycle 30% of the time, its average CPI is no longer one, but is instead 1.30. The equation for execution time is: CPI * cycle time * # of instructions As # of instructions is the same, the speedup is (1 CPI * 4ns)/(1.3 CPI * 1ns) = 3.08. c. Now consider a 3 stage pipeline machine with a cycle time of 1.1ns. Again assuming no stalls, is this implementation faster or slower than the original 4 stage pipeline? Explain your answer. Solution: The 4 stage machine is faster. This is because it has a smaller cycle time, which results in faster overall execution time (assuming no stalls). Question 2 Consider the following code fragment: Loop: LW R1, 0(R2) DADDI R1, R1, 1 SW R1, 0(R2) DADDI R2, R2, 4 DADDI R4, R4, -4 BNEZ R4, Loop Consider the standard 5 stage pipeline machine (IF ID EX MEM WB). Assume the initial value of R4 is 396 and all memory accesses hit in the cache. a. Show the timing of the above code fragment for one iteration as well as for the load of the second iteration. For this part, assume there is no forwarding or bypassing hardware. Assume a register write occurs in the first half of the cycle and a register read occurs in the last half of the cycle. Also, assume that branches are resolved in the memory stage and are handled by flushing the pipeline. Use a pipeline timing chart to show the timing. How many cycles does this loop take to complete (for all iterations, not just one iteration)? Solution: It is evident that the loop iterates 99 times. To calculate the total time the loop takes to iterate, we look at the length of the first 98 iterations, then factor in the 99th iteration which takes a bit longer to execute. The pipeline diagram: Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 LW R1, 0(R2) F D X M W DADDI R1, R1, 1 F D S S X M W SW R1, 0(R2) F S S D S S X M W DADDI R2, R2, 4 F S S D X M W DADDI R4, R4, -4 F D X M W BNEZ R4, Loop F D S S X M W LW R1, 0(R2) F D X M W Here, “S” indicates a stall. The last cycle of an iteration is overlapped with the first cycle of the next, so it is not counted until the end. Therefore, the first 98 iterations take 15 cycles each, while the last iteration takes 16 cycles. Therefore, the total time taken from the code to execute is 98 x 15 + 16 = 1486 clock cycles. b. Show the timing for the same instruction sequence for the pipeline with full forwarding and bypassing hardware (as discussed in class). Assume that branches are resolved in the MEM stage and are predicted as not taken. How many cycles does this loop take to complete? Solution: Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 LW R1, 0(R2) F D X M W DADDI R1, R1, 1 F D S X M W SW R1, 0(R2) F S D X M W DADDI R2, R2, 4 F D X M W DADDI R4, R4, -4 F D X M W BNEZ R4, Loop F D X M W LW R1, 0(R2) F D X M W The last cycle of an iteration is overlapped with the first cycle of the next, so it is not counted until the end. Therefore, the first 98 iterations take 10 cycles each, while the last iteration takes 11 cycles. Therefore, the total time taken from the code to execute is 98 x 10 + 11 = 991 clock cycles.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved