Download MPI Programming: Message Passing Interface with Sanjay Rajopadhye - Prof. Sanjay V. Rajopa and more Study notes Computer Science in PDF only on Docsity! 1 Message Passing Interface (Chapter 4) Sanjay Rajopadhye Colorado State University Fall 2008 Week 3 Outline Background & History Message passing programming model SPMD Distributed memory Writing MPI programs Compiling, running and benchmarking Background & History Prior to 1990: vendor specific libraries No portability Parallel Virtual Machine (PVM) [ORNL 1989] public release 1993 enables portable parallel programming Parallel [sic] efforts to develop an open, portable general purpose parallel programming: 1992-1994 Parallel I/O [1997] Nearly ubiquitous on “big iron” “Assembly language” of parallel programming similar advantages/disadvantages Foundation SPMD: Single Program Multiple Data write a single program using a special library compile it with special compiler (that knows about the library) execute “it” in an environment created by the environment variables of your session (as on bassi) with a special command on your local machine (e.g., Lam MPI) multiple instances of the executable run in parallel Each instance has its own local data (distributed memory) MPI library provides functions for coordinating and communicating amongst the instances 2 Example library calls MPI_Init MPI_Comm_rank MPI_Comm_size MPI_Reduce MPI_Finalize Example program Contrived program: don’t take (too) seriously Boolean Circuit Satisfiability Given a Boolean circuit (i.e., a function that takes n inputs and produces a Boolean output (using and, or and not) 2n possible inputs (n-bit sequences) Which input combinations produce a 1 on the output (i.e., satisfy the circuit)? Difficult problem in computer science (NP-complete) Solution (algorithm) NP Complete: no known polynomial time algorithm Accept approximate solutions Solve by “exhaustive” enumeration (exponential time) our approach (we seek all solutions) Example Circuit 5 Execution notes Non-deterministic order of printf outputs Multiple prints by same process will always occur in the order they are executed. calls by different processes will be queued (importance of flush) the system may interleave the processing of the commands Extensions In addition to printing out the solutions we want to count how many inputs satisfy the circuit Introduce your first (collective) communication call MPI_Reduce: combines values in different processes into a single result Details Each process has a counter to maintain how many solutions it found modify check_circuit return 0 if not satisfiable return 1 if satisfiable Introduce your first (collective) communication call at the end of the loop MPI_Reduce: combines values in different processes into a single result Call after the for loop MPI_Reduce int MPI_Reduce ( void *sendbuf, /* ptr to first argument */ void *recvbuf, /* ptr to first result */ int count, /* number of values to combine */ MPI_Datatype datatype, /* type */ MPI_Op op, /* operator */ int root, /* destination */ MPI_Comm comm ) 6 Datatypes allowed MPI_CHAR MPI_DOUBLE MPI_FLOAT MPI_INT MPI_LONG MPI_LONG_DOUBLE MPI_SHORT MPI_UNSIGNED_CHAR MPI_UNSIGNED MPI_UNSIGNED_LONG MPI_UNSIGNED_SHORT Operators allowed MPI_BAND MPI_BOR MPI_BXOR MPI_LAND MPI_LOR MPI_LXOR MPI_MAX MPI_MAXLOC MPI_MIN MPI_MINLOC MPI_PROD MPI_SUM Benchmarking First debug the program and make sure that the outputs are correct Remove all printf and debug statements Introduce calls to time the program MPI_Barrier MPI_Wtime MPI_Wtick General rules You should time the entire program, including all essential file I/O etc. You should also time the “main” part of the program (to get an idea of overheads) You should plot speedups (but caveat: don’t set up “strawmen”) Summary MPI programming made simple pleasantly (never embarrassingly) parallel Basic functions Initialize Rank Size Reduce Finalize Barrier Wtime Wtick