Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Concurrency Mapping - Parallel Computing - Lecture Slides, Slides of Parallel Computing and Programming

Parallel Computing is emerging subject in filed of computer science. This course is designed to introduce architecture and basic concepts of parallel computing. This lecture includes: Concurrency Mapping, Parallel Algorithms, Tasks and Decomposition, Processes and Mapping, Processes Versus Processors, Decomposition Techniques, Recursive Decomposition, Data Decomposition, Hybrid Decomposition, Dynamic Mappings

Typology: Slides

2012/2013

Uploaded on 09/28/2013

dhanvant
dhanvant 🇮🇳

4.9

(9)

90 documents

1 / 53

Toggle sidebar

Related documents


Partial preview of the text

Download Concurrency Mapping - Parallel Computing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity! Principles of Parallel Algorithm Design: Concurrency and Mapping docsity.com 2 • Introduction to parallel algorithms —tasks and decomposition —processes and mapping —processes versus processors • Decomposition techniques - part 1 —recursive decomposition —data decomposition docsity.com 5 Exploratory Decomposition Example Solving a 15 puzzle • Sequence of three moves from state (a) to final state (d) • From an arbitrary state, must search for a solution docsity.com 6 Exploratory Decomposition: Example Solving a 15 puzzle Search —generate successor states of the current state —explore each as an independent task initial state final state (solution) after first move docsity.com 7 Exploratory Decomposition Speedup • Parallel formulation may perform a different amount of work • Can cause super- or sub-linear speedup m m m m m m m m total serial work = 2m + 1 total parallel work = 4 total serial work = m total parallel work = 4m solution docsity.com 10 Hybrid Decomposition Use multiple decomposition strategies together Often necessary for adequate concurrency • Quicksort —recursive decomposition alone limits concurrency (why?) —augmenting recursive with data decomposition is better – can use data decomposition on input data to compute a split • Discrete event simulation —data parallelism may be possible when processing a task docsity.com 11 Topics for Today • Decomposition techniques - part 2 —exploratory decomposition —hybrid decomposition • Characteristics of tasks and interactions —task generation, granularity, and context —characteristics of task interactions • Mapping techniques for load balancing —static mappings —dynamic mappings • Methods for minimizing interaction overheads • Parallel algorithm design templates ☛ docsity.com 12 Characteristics of Tasks • Key characteristics —generation strategy —associated work —associated data size • Impact choice and performance of parallel algorithms docsity.com 15 Size of Data Associated with Tasks • Data may be small or large compared to the computation —size(input) < size(computation), e.g., 15 puzzle —size(input) = size(computation) > size(output), e.g., min —size(input) = size(output) < size(computation), e.g., sort • Implications —small data: task can easily migrate to another process —large data: ties the task to a process – possibly can avoid communicating the task context reconstruct/recompute the context elsewhere docsity.com 16 Characteristics of Task Interactions Orthogonal classification criteria • Static vs. dynamic • Regular vs. irregular • Read-only vs. read-write • One-sided vs. two-sided docsity.com 17 Characteristics of Task Interactions • Static interactions —tasks and interactions are known a-priori —simpler to code • Dynamic interactions —timing or interacting tasks cannot be determined a-priori —harder to code – especially using two-sided message passing APIs docsity.com 20 Static Irregular Task Interaction Pattern Sparse matrix-vector multiply docsity.com 21 Characteristics of Task Interactions • Read-only interactions —tasks only read data associated with other tasks • Read-write interactions —read and modify data associated with other tasks —harder to code: requires synchronization – need to avoid read-write and write-write ordering races docsity.com 22 Characteristics of Task Interactions • One-sided —initiated & completed independently by 1 of 2 interacting tasks – GET – PUT • Two-sided —both tasks coordinate in an interaction – SEND + RECV docsity.com 25 Mapping Techniques for Minimum Idling • Must simultaneously minimize idling and load balance • Balancing load alone does not minimize idling Time Time docsity.com 26 Mapping Techniques for Minimum Idling Static vs. dynamic mappings • Static mapping —a-priori mapping of tasks to processes —requirements – a good estimate of task size – even so, optimal mapping may be NP complete e.g., multiple knapsack problem • Dynamic mapping —map tasks to processes at runtime —why? – tasks are generated at runtime, or – their sizes are unknown Factors that influence choice of mapping • size of data associated with a task • nature of underlying domain docsity.com 27 Schemes for Static Mapping • Data partitionings • Task graph partitionings • Hybrid strategies docsity.com 30 Block Array Distribution Example Multiplying two dense matrices C = A x B • Partition the output matrix C using a block decomposition • Give each task the same number of elements of C —each element of C corresponds to a dot product —even load balance • Obvious choices: 1D or 2D decomposition • Select to minimize associated communication overhead docsity.com Data Usage in Dense Matrix Multiplication A B c (a) A = - Po PP, P, P3 es P| Ps| Pe| P- L________________- x 7 Ps| Po} Pio Pir Pi Piz) Pia) Pris (b) 1 docsity.com 32 Consider: Gaussian Elimination Active submatrix shrinks as elimination progresses A[k,j] docsity.com 35 Block-Cyclic Distribution (a) 1D block-cyclic (b) 2D block-cyclic • Cyclic distribution: special case with block size = 1 • Block distribution: special case with block size is n/p, —n is the dimension of the matrix; p is the # of processes docsity.com 36 Decomposition by Graph Partitioning Sparse-matrix vector multiply • Graph of the matrix is useful for decomposition —work ~ number of edges —communication for a node ~ node degree • Goal: balance work & minimize communication • Partition the graph —assign equal number of nodes to each process —minimize edge count of the graph partition docsity.com 37 Partitioning a Graph of Lake Superior Random Partitioning Partitioning for minimum edge-cut docsity.com 40 Mapping a Sparse Graph Sparse matrix-vector product sparse matrix structure 17 items to communicate partitioning mapping docsity.com 41 Mapping a Sparse Graph Sparse matrix-vector product mapping 13 items to communicate partitioning sparse matrix structure 17 items to communicate docsity.com 42 Hierarchical Mappings • Sometimes a single mapping is inadequate —e.g., task mapping of quicksort binary tree cannot readily use a large number of processors. • Hierarchical approach —use a task mapping at the top level —data partitioning within each task docsity.com 45 Centralized Dynamic Mapping • Processes = masters or slaves • General strategy —when a slave runs out of work → request more from master • Challenge —master may become bottleneck for large # of processes • Approach —chunk scheduling: process picks up several of tasks at once —however – large chunk sizes may cause significant load imbalances – gradually decrease chunk size as the computation progresses docsity.com 46 Distributed Dynamic Mapping • All processes as peers • Each process can send or receive work from other processes —avoids centralized bottleneck • Four critical design questions —how are sending and receiving processes paired together? —who initiates work transfer? —how much work is transferred? —when is a transfer triggered? • Ideal answers can be application specific • Cilk uses a distributed dynamic mapping: “work stealing” docsity.com 47 Topics for Today • Decomposition techniques - part 2 —exploratory decomposition —hybrid decomposition • Characteristics of tasks and interactions —task generation, granularity, and context —characteristics of task interactions • Mapping techniques for load balancing —static mappings —dynamic mappings • Methods for minimizing interaction overheads • Parallel algorithm design templates ☛ docsity.com 50 Topics for Today • Decomposition techniques - part 2 —exploratory decomposition —hybrid decomposition • Characteristics of tasks and interactions —task generation, granularity, and context —characteristics of task interactions • Mapping techniques for load balancing —static mappings —dynamic mappings • Methods for minimizing interaction overheads • Parallel algorithm design templates ☛ docsity.com 51 Parallel Algorithm Model • Definition: ways of structuring a parallel algorithm • Aspects of a model —decomposition —mapping technique —strategy to minimize interactions docsity.com 52 Common Parallel Algorithm Models • Data parallel —each task performs similar operations on different data —typically statically map tasks to processes • Task graph —use task dependency graph relationships to – promote locality, or reduce interaction costs • Master-slave —one or more master processes generate work —allocate it to worker processes —allocation may be static or dynamic • Pipeline / producer-consumer —pass a stream of data through a sequence of processes —each performs some operation on it • Hybrid —apply multiple models hierarchically, or —apply multiple models in sequence to different phases docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved