Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Multithreading - Intro to Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Computer Architecture and Organization

During the course work of the Intro to Computer Architecture, we study the main concept regarding the:Multithreading, Pipeline Hazards, Peripheral Processors, Simple Multithreaded Pipeline, Multithreading Costs, Thread Scheduling Policies, Coarse-Grained Multithreading, Multithreading Design Choices, Instruction Format

Typology: Slides

2012/2013

Uploaded on 05/06/2013

anurati 🇮🇳

4.1

(23)

128 documents

1 / 30

Partial preview of the text

Download Multithreading - Intro to Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity! CS 162 Computer Architecture Lecture 10: Multithreading Docsity.com Pipeline Hazards LW r1, 0(r2) LW r5, 12(r1) ADDI r5, r5, #12 SW 12(r1), r5 • Each instruction may depend on the next – Without bypassing, need interlocks LW r1, 0(r2) LW r5, 12(r1) ADDI r5, r5, #12 SW 12(r1), r5 • Bypassing cannot completely eliminate interlocks or delay slots Docsity.com Simple Multithreaded Pipeline • Have to carry thread select down pipeline to ensure correct state bits read/written at each pipe stage Docsity.com Multithreading Costs • Appears to software (including OS) as multiple slower CPUs • Each thread requires its own user state – GPRs – PC • Also, needs own OS control state – virtual memory page table base register – exception handling registers • Other costs? Docsity.com Thread Scheduling Policies • Fixed interleave (CDC 6600 PPUs, 1965) – each of N threads executes one instruction every N cycles – if thread not ready to go in its slot, insert pipeline bubble • Software-controlled interleave (TI ASC PPUs, 1971) – OS allocates S pipeline slots amongst N threads – hardware performs fixed interleave over S slots, executing whichever thread is in that slot • Hardware-controlled thread scheduling (HEP, 1982) – hardware keeps track of which threads are ready to go – picks next thread to execute based on hardware priority scheme Docsity.com Denelcor HEP (Burton Smith, 1982) • First commercial machine to use hardware threading in main CPU – 120 threads per processor – 10 MHz clock rate – Up to 8 processors – precursor to Tera MTA (Multithreaded Architecture) Docsity.com Tera MTA Overview • Up to 256 processors • Up to 128 active threads per processor • Processors and memory modules populate a sparse 3D torus interconnection fabric • Flat, shared main memory – No data cache – Sustains one main memory access per cycle per processor • 50W/processor @ 260MHz Docsity.com MTA Instruction Format • Three operations packed into 64-bit instruction word (short VLIW) • One memory operation, one arithmetic operation, plus one arithmetic or branch operation • Memory operations incur ~150 cycles of latency • Explicit 3-bit “lookahead” field in instruction gives number of subsequent instructions (0-7) that are independent of this one – c.f. Instruction grouping in VLIW – allows fewer threads to fill machine pipeline – used for variable- sized branch delay slots • Thread creation and termination instructions Docsity.com Coarse-Grain Multithreading • Tera MTA designed for supercomputing applications with large data sets and low locality – No data cache – Many parallel threads needed to hide large memory latency • Other applications are more cache friendly – Few pipeline bubbles when cache getting hits – Just add a few threads to hide occasional cache miss latencies – Swap threads on cache misses Docsity.com MIT Alewife • Modified SPARC chips – register windows hold different thread contexts • Up to four threads per node • Thread switch on local cache miss Docsity.com IBM PowerPC RS64-III (Pulsar) • Commercial coarse-grain multithreading CPU • Based on PowerPC with quad-issue in-order fivestage pipeline • Each physical CPU supports two virtual CPUs • On L2 cache miss, pipeline is flushed and execution switches to second thread – short pipeline minimizes flush penalty (4 cycles), small compared to memory access latency – flush pipeline to simplify exception handling Docsity.com Vertical Multithreading • Cycle-by-cycle interleaving of second thread removes vertical waste Docsity.com Ideal Multithreading for Superscalar • Interleave multiple threads to multiple issue slots with no restrictions Docsity.com Simultaneous Multithreading • Add multiple contexts and fetch engines to wide out-of-order superscalar processor – [Tullsen, Eggers, Levy, UW, 1995] • OOO instruction window already has most of the circuitry required to schedule from multiple threads • Any single thread can utilize whole machine Docsity.com From Superscalar to SMT • Extra pipeline stages for accessing thread-shared register files Docsity.com From Superscalar to SMT • Fetch from the two highest throughput threads. • Why? Docsity.com From Superscalar to SMT • Small items – per-thread program counters – per-thread return stacks – per-thread bookkeeping for instruction retirement, trap & instruction dispatch queue flush – thread identifiers, e.g., with BTB & TLB entries Docsity.com

Documents

questions

Multithreading - Intro to Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text