Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture Slides on Instruction Scheduling - Introduction to Compilers | CMSC 430, Lab Reports of Computer Science

Material Type: Lab; Class: INTRO TO COMPILERS; Subject: Computer Science; University: University of Maryland; Term: Spring 2009;

Typology: Lab Reports

Pre 2010

Uploaded on 07/30/2009

koofers-user-10r-1
koofers-user-10r-1 🇺🇸

5

(1)

10 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture Slides on Instruction Scheduling - Introduction to Compilers | CMSC 430 and more Lab Reports Computer Science in PDF only on Docsity! 1 Instruction Scheduling CS430 2 What Makes Code Run Fast? • Many operations have non-zero latencies • Modern machines can issue several operations per cycle • Execution time is order-dependent (and has been since the 60’s) Assumed latencies (conservative) Operation Cycles load 3 store 3 loadI 1 add 1 mult 2 fadd 1 fmult 2 shift 1 branch 0 to 8 • Loads & stores may or may not block > Non-blocking ⇒fill those issue slots • Branch costs vary with path taken • Branches typically have delay slots > Fill slots with unrelated operations > Percolates branch upward • Scheduler should hide the latencies Lab 3 will build a local scheduler CS430 3 Example w ← w * 2 * x * y * z 1 loadAI r0,@w ⇒ r1 4 add r1,r1 ⇒ r1 5 loadAI r0,@x ⇒ r2 8 mult r1,r2 ⇒ r1 9 loadAI r0,@y ⇒ r2 12 mult r1,r2 ⇒ r1 13 loadAI r0,@z ⇒ r2 16 mult r1,r2 ⇒ r1 18 storeAI r1 ⇒ r0,@w 21 r1 is free 1 loadAI r0,@w ⇒ r1 2 loadAI r0,@x ⇒ r2 3 loadAI r0,@y ⇒ r3 4 add r1,r1 ⇒ r1 5 mult r1,r2 ⇒ r1 6 loadAI r0,@z ⇒ r2 7 mult r1,r3 ⇒ r1 9 mult r1,r2 ⇒ r1 11 storeAI r1 ⇒ r0,@w 14 r1 is free Simple schedule Schedule loads early 2 registers, 20 cycles 3 registers, 13 cycles Reordering operations for speed is called instruction scheduling CS430 4 Instruction Scheduling (Engineer’s View) The Problem Given a code fragment for some target machine and the latencies for each individual operation, reorder the operations to minimize execution time The Concept Scheduler slow code fast code Machine description The task • Produce correct code • Minimize wasted cycles • Avoid spilling registers • Operate efficiently CS430 5 Instruction Scheduling (The Abstract View) To capture properties of the code, build a precedence graph G • Nodes n ∈ G are operations with type(n) and delay(n) • An edge e = (n1,n2) ∈ G if & only if n2 uses the result of n1 a: loadAI r0,@w ⇒ r1 b: add r1,r1 ⇒ r1 c: loadAI r0,@x ⇒ r2 d: mult r1,r2 ⇒ r1 e: loadAI r0,@y ⇒ r2 f: mult r1,r2 ⇒ r1 g: loadAI r0,@z ⇒ r2 h: mult r1,r2 ⇒ r1 i: storeAI r1 ⇒ r0,@w The Code a b c d e f g h i The Precedence Graph CS430 6 Instruction Scheduling (Definitions) A correct schedule S maps each n∈ N into a non-negative integer representing its cycle number, and 1. S(n) ≥ 0, for all n ∈ N, obviously 2. If (n1,n2) ∈ E, S(n1 ) + delay(n1 ) ≤ S(n2 ) 3. For each type t, there are no more operations of type t in any cycle than the target machine can issue The length of a schedule S, denoted L(S), is L(S) = maxn ∈ N (S(n) + delay(n)) The goal is to find the shortest possible correct schedule. S is time-optimal if L(S) ≤ L(S1 ), for all other schedules S1 A schedule might also be optimal in terms of registers, power, or space…. 2 CS430 7 Instruction Scheduling (What’s so difficult?) Critical Points • All operands must be available • Multiple operations can be ready • Moving operations can lengthen register lifetimes • Placing uses near definitions can shorten register lifetimes • Operands can have multiple predecessors Together, these issues make scheduling hard (NP-Complete) Local scheduling is the simple case • Restricted to straight-line code • Consistent and predictable latencies CS430 8 Instruction Scheduling The big picture 1. Build a precedence graph, P 2. Compute a priority function over the nodes in P 3. Use list scheduling to construct a schedule, one cycle at a time a. Use a queue of operations that are ready b. At each cycle I. Choose a ready operation and schedule it II. Update the ready queue Local list scheduling • The dominant algorithm for twenty years • A greedy, heuristic, local technique CS430 9 Local List Scheduling Cycle ← 1 Ready ← leaves of P Active ← Ø while (Ready ∪ Active ≠ Ø) if (Ready ≠ Ø) then remove an op from Ready S(op) ← Cycle Active ← Active ∪ op Cycle ← Cycle + 1 for each op ∈ Active if (S(op) + delay(op) ≤ Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready ← Ready ∪ s Removal in priority order op has completed execution If successor’s operands are ready, put it on Ready CS430 10 Scheduling Example 1. Build the precedence graph a: loadAI r0,@w ⇒ r1 b: add r1,r1 ⇒ r1 c: loadAI r0,@x ⇒ r2 d: mult r1,r2 ⇒ r1 e: loadAI r0,@y ⇒ r2 f: mult r1,r2 ⇒ r1 g: loadAI r0,@z ⇒ r2 h: mult r1,r2 ⇒ r1 i: storeAI r1 ⇒ r0,@w The Code a b c d e f g h i The Precedence Graph CS430 11 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path a: loadAI r0,@w ⇒ r1 b: add r1,r1 ⇒ r1 c: loadAI r0,@x ⇒ r2 d: mult r1,r2 ⇒ r1 e: loadAI r0,@y ⇒ r2 f: mult r1,r2 ⇒ r1 g: loadAI r0,@z ⇒ r2 h: mult r1,r2 ⇒ r1 i: storeAI r1 ⇒ r0,@w The Code a b c d e f g h i The Precedence Graph 3 5 8 7 9 10 12 11 14 CS430 12 Scheduling Example 1. Build the precedence graph 2. Determine priorities: longest latency-weighted path 3. Perform list scheduling loadAI r0,@w ⇒ r11) a: add r1,r1 ⇒ r14) b: loadAI r0,@x ⇒ r22) c: mult r1,r2 ⇒ r15) d: loadAI r0,@y ⇒ r33) e: mult r1,r3 ⇒ r17) f: loadAI r0,@z ⇒ r26) g: mult r1,r2 ⇒ r19) h: 11) i: storeAI r1 ⇒ r0,@w The Code a b c d e f g h i The Precedence Graph 3 5 8 7 9 10 12 10 13 New register name used
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved