Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Polymorphic Processors: Programming Models, Lecture notes of Programming Languages

A lecture on polymorphic processors and programming models. The lecture discusses the need for a simple, correct, and general programming model that can map parallelism in the application to the support for parallelism in the hardware. The lecture also presents three different approaches to programming model design and discusses container-based transformation techniques such as loop parallelization, container privatization, and exploiting associativity. from Stanford University in 2003.

Typology: Lecture notes

Pre 2010

Uploaded on 05/11/2023

anandamayi
anandamayi 🇺🇸

4.3

(8)

12 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Polymorphic Processors: Programming Models and more Lecture notes Programming Languages in PDF only on Docsity! EE392C: Advanced Topics in Computer Architecture Lecture #9 Polymorphic Processors Stanford University Tuesday, May 6, 2003 Programming Models Lecture #9: Tuesday, 29 April 2003 Lecturer: Paul W. Lee, Rebecca Schultz, Varun S. Malhotra Scribe: Sorav Bansal, Brad Schumitsch 1 Introduction The need for a simple, correct and general programming model cannot be over-emphasized. A good programming model is instrumental in mapping the parallelism in the application to the support for parallelism in the hardware. There are two perspectives to keep in mind when designing a programming model - the perspective of the programmer and the perspective of the compiler-writer. From the point of view of the programmer 1. The interface should be easy and intuitive 2. The model should make it easy for the programmer to identify parallelism and indicate it. The complexity should be hidden by the compiler. 3. The programming model should be architecture-independent 4. The programmer should not need to worry about details like locality-of-usage. From the point of view of the Compiler-Writer The program structure should convey some parallelism that the compiler can exploit. This parallelism could be either ILP, TLP or ILP. Present-day programming languages like C hide everything but ILP. The papers we discussed ([1], [2], [3]) present three different approaches to program- ming model design. 2 Paper Summaries 2.1 Container-Centric Approach For Parallelizing Applications ([1]) Parallelizing compilers have been successful in array based numerical applications, but the success has been limited in other areas. To effectively parallelize applications for 2 EE392C: Lecture #9 Parallelism in Application Programming Model Support for Parallelism in Hardware these other areas, the compiler should be aware of the attributes of a program in terms of its aggregate data structures. This paper presents the concept of containers, any general-purpose aggregate data type such as matrices, lists, tables, graphs and I/O streams. The use of containers is very similar in most programs independent of implementation, and can be abstracted for use in data analysis and parallelization. The paper uses two of the most common containers to demonstrate its ideas. Linear containers are containers where its elements are accessed in a linear manner, such as lists, stacks and queues. Content-addressable container elements are accessed through keys. Examples of this are tables, sets, and maps. Containers are central to the flow of data, and are a primary source of data parallelism, but conventional compilers are not aware of the structure of containers and thus cannot take advantage of the information at this level. In order to describe concrete containers to the compiler, the paper introduces abstract containers and methods. The compiler takes as input, in addition to the source code, the container description in terms of abstract containers, to perform container specific analysis and transformation to parallelize the program. The paper discusses container-based transformation techniques such as loop paral- lelization, container privatization, and exploiting associativity. All of these transforma- tions are analogous to similar transformations on arrays. Dependency test methods are also explored. Range checks are used for linear containers, and commutativity analysis is used for content-addressable containers. Disjoint keys result in commutativity, but overlapping keys are very difficult to analyze. The pattern search-otherwise insert is suggested as a possible commutative case. A number of Java, C++, and C applications were studied to identify possible trans- formation techniques. The applications were manually parallelized using the above- mentioned techniques, and large parts of the applications could be parallelized. The paper unfortunately does not give any results to indicate how effective these transforma- tions were in terms of performance improvement. The paper proposes using the information at the data structure level to parallelize programs. The widespread use of containers such as C++ STL makes this approach very attractive as a way to attain higher levels of parallelization without burdening the programmer. The notion of abstract containers is useful as a specification language for EE392C: Lecture #9 5 FIFO. Within this block, the stream data can be accessed via three methods, push, pop and peek, allowing removal, addition and inspection without removal of the stream items. The statements for connected the filter nodes together are pipeline statements, which establish a sequential series of filters, splitjoin statements, which generate parallel streams that diverge from a single stream, and feedbackloop statements, which create a cycle in the graph. The StreamIt compiler analyzes the stream graph and maps it onto the hardware. It is a three phase process. In the first phase is called the portioning phase. The compiler merges and splits the filter statements until there is a single filter for each processing element in the hardware topology and these filters are balanced to have approximately the same execution time. The second phase of the compiler is the layout phase. In this phase the compiler takes the filters from the portioning phase and chooses which pro- cessing element should execute each one. A cost function is used to represent the cost of communication latencies and a simple simulated annealing algorithm is used. The third phase is the communication scheduling phase. In this phase the compiler converts the communication between filters implicit in the stream graph into communication instruc- tions appropriate for the hardware. The compiler must be careful to avoid starvation or deadlock. The StreamIt language and compiler are appropriate for the RAW architecture for which it was designed, but despite the authors claim that it is useful for other parallel architectures, a number of features reduce the reusability of the design. For one thing the model extracts parallelism from the individual processors during the portioning phase. Once filters have been merged and split into a single filter for each processing node, the filter is executed sequentially on that processing element. There are no opportunities to extract additional parallelism at the elements. Additionally, while the implicit communi- cation model in the stream language maps well to the communication network in RAW, it may not work well on other architectures. The StreamIt programming language also has the additional weaknesses that it re- quires static rate streams. Also, sends and receives can not be interleaved, all data must be received by a filter before it can be processed. Despite some weaknesses, the program- ming model does provide a way of efficiently executing a stream program that fits its requirements on a hardware architecture with which it is compatible. 3 Discussion Notes After studying three different programming models, the discussion time was spent in identifying the strengths and weaknesses of the three suggested approaches. 3.1 Perspectives Why have these approaches not been widely adopted? It has been noticed that, while people have suggested various programming models, most 6 EE392C: Lecture #9 of them have found it very hard to be widely adopted by other people. Some of the rea- sons for this are: 1. The processors have been improving at 55% each year. So people do not feel the need to shift to a totally new programming paradigm, when they can get the desired performance by waiting for a faster processor. 2. There are not too many parallel machines in use. Now, parallel machines are becoming a lot more common. 3. It is hard to get a programmer to shift to a new programming model. What would it take for a programmer to shift to a new Programming ap- proach? 1. It should be as close to the present day languages (say C) as possible. 2. There should be a very good tool/library support, so that it is easy to use for the programmer. 3. If possible, the new programming model should provide backwards-compatibility with existing programming languages. 4. The programming model should be general. It should be applicable to a wide class of applications. 5. The programming model should help in a huge-huge performance gain. A perfor- mance gain of 100% is not enough to justify the shift. The gain should be of the order of 10X to get the programmers to shift. This is from the perspective of a programmer. The new programming model should cater to the needs of the compiler-writer as well. This brings us to the question: As a compiler writer, what do you want to know? 1. Control Dependences - Which will let you identify independent parallel tasks. 2. Shared/Non-Shared Variables - Data dependence between the tasks. 3. Memory Access Pattern - If possible, the programming model should be able to disambiguate memory accesses. Which of these issues are addressed by each of the these programming models? • Containers: Containers disambiguates memory accesses for the compiler. There are no random pointers. It tells the compiler about the data access pattern. How- ever, containers do not help at all in identifying task level parallelism. EE392C: Lecture #9 7 • Jade: The Jade programming model tells the compiler about the data and control dependencies. It dilineates different tasks and explicitly identifies the shared data. However, it tells the compiler nothing about the memory access patterns. • StreamIt: The Streamit model helps identify task-level parallelism (eg. different filters), data dependencies, as well as memory access patterns (eg. FIFOs). Thus, it makes the job of the compiler writer relatively easier. The catch, however, is that the programming model caters to a very limited set of applications. It requires fixed rate streams exhibiting fixed communication patterns. Variable output rates (eg. Data compression) cannot be handled very well by the StreamIt programming model. • Brook: Brook is designed to program the next version of Imagine. It has support for multiple dimension streams to allow for scientific computing. In Brook, function calls to variable rate streams are allowed, thus ameliorating some of the drawbacks of StreamIt. The programming model is not tied to the number of processors in the underlying machine. However, no loop carried dependencies are allowed in a kernel. The kernels need to be data-parallel. In StreamC, the programmer had to do cluster communication herself, while in Brook communication is inferred. 3.2 Shared Memory vs Message Passing Models The programming models for parallel programming can be broadly categorized into the Shared-Memory Model (SMP) and the Message Passing Model (MPP). The shared mem- ory model works by using fork and join to create and synchronize processes. The main difficulty in the Shared Memory model is fine-grained synchronization of shared mem- ory. The canonical ways to synchronize shared memory is through locks and barriers. Programmers often complain about the difficulty in guarding against race conditions and deadlocks in the SMP model. The Jade programming model also suffers from this lacuna since the programmer needs to explicitly provide information about shared variables. The StreamIt model falls into the MPP category. The programmer does not need to worry about synchronization problems. However, writing the program in StreamIt may itself be quite tough. Using containers, the programmer doesn’t have to worry about synchronization. Dif- ferent sections of the container may be executed in parallel, but this is managed by libraries and the programmer need not be concerned. In general, message passing code is harder to write the first time, but it is easier to debug. Conversely, writing a program with locks and joins is easy to get running at the first pass. However, it is hard to debug and optimize these programs since the locks and joins hide memory transactions.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved