Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Sparse Matrix Storage Formats: Implementation and Performance Analysis, Slides of Applications of Computer Sciences

The importance of sparse storage formats for large data sets consisting mainly of zeros. The project aims to extend the existing sparselib++ library with more storage formats and efficient matrix-vector multiplication routines for both sequential and parallel processing. The document also covers various sparse storage techniques, their working, and their performance analysis.

Typology: Slides

2011/2012

Uploaded on 07/18/2012

padmavati
padmavati 🇮🇳

4.6

(24)

165 documents

1 / 46

Toggle sidebar

Related documents


Partial preview of the text

Download Sparse Matrix Storage Formats: Implementation and Performance Analysis and more Slides Applications of Computer Sciences in PDF only on Docsity! Introduction • Sparse storage formats are of great importance if we are dealing with large data mostly consisting of zeros • Sparse storage formats are techniques for storing and processing matrix data efficiently docsity.com Project Aim • The existing library for sparse matrices (SparseLib++) consists of only three sparse storage techniques and routines for their matrix-vector multiplication. The aim of doing this project is to implement a sparse library that will have more storage formats and matrix-vector multiplication in both sequential and parallel • For performance evaluation I apply different test matrices on these storage techniques, taken from matrix market docsity.com Cont… • If there are some more users to be added in to the existing network, then it becomes a growing matrix size problem called expanding matrices, that can be easily resolved in sparse environment • If the network is yet to be initialized, it falls into the category of dynamic matrices. docsity.com Review • So far I have studied the sparse storage formats, and their working and sequential implementation of some of the important techniques among them • The point Entry storage techniques that I have implemented are – Coordinate storage format – Compressed Row storage format – Compressed column storage format docsity.com Cont… – Diagonal storage format – Jagged diagonal storage format – Transpose jagged Diagonal storage format – Java sparse array – Skyline (symmetric) • The block entry storage formats that I have implemented are: – Block Coordinate storage format – Block Compressed Row format – Block compressed Column format docsity.com Analysis • There are two factors that influence the performance and storage size of sparse matrices. They are: • Number of Non-zeros – The number of non zeros has the same effect on all the storage formats. It is obvious that if there are more nonzeros in the matrix it will consume more storage size and processing time. docsity.com Cont… – Those storage techniques that have a direct access to values like COO will be more affected in storage size as compared to processing time. While other techniques like CSR and CSC that access values indirectly using pointers will be more affected in processing time as compared to storage size docsity.com Cont… • Organization of non-zeros – The way in which the nonzeros are spread all over the matrix also affect the storage size and processing time of a matrix data – The effect of organization of nonzeros within a matrix varies from technique to technique. An organization of nonzeros will be efficient using one technique but not in the other – COO technique is an exception and is not affected by the organization of nonzeros – “No storage format is efficient for all the sparse matrices however the selection of a suitable one can give better results” (Jack Dongarra) docsity.com Cont… – If the nonzeros are mostly in diagonals then DIA technique will give better results docsity.com Cont… • If the nonzeros are randomly spread and there are no patterns to exploit then we can use JDS or TJDS docsity.com Cont… – If the matrix is symmetric then we can store only one half of the matrix data – If we require high efficiency at the cost of storage size we can use skyline technique that embed zeros to the matrix data at their locations – If the matrix size is to large to handle efficiently then we have to use block entry storage formats instead of point entry – Java sparse array is recently devised technique and has proved very efficient using OO approach docsity.com Ores ah aan ¢ CSC-vector Multiplication Test Matrices data • BCS structural Engineering Matrices of linear equations Name Size (MB) % nonzeros bcsstk16 4.411 0.006189 bcsstk25 4.129 0.000561 bcsstk28 3.327 0.005744 bcsstk17 6.657 0.001825 bcsstk24 2.438 0.006442 http://math.nist.gov/MatrixMarket/index.html docsity.com Results 0 10 20 30 40 50 60 70 bcsstk16 bcsstk28 bcsstk24 COO CSR CSC docsity.com MPI • MPI is a standard communication library • MPI is designed for high performance in parallel processing • The MPI subroutines distribute the processing load among computers in the network • Its portable and efficient docsity.com MPI basic commands • MPI_Init() • MPI_Comm_size() • MPI_Comm_rank() • MPI_Send() • MPI_Recv() • MPI_Barrier() • MPI_Finalize() docsity.com MPICH2 • It is the 2nd version of MPI • I have preferred MPICH2 over MPICH1 due to these reasons – In MPICH2 modified some of the functionalities of MPICH1 – It provides additional functionalities like – Remote memory – Parallel I/O – Dynamic process – Threads docsity.com Cont… • The MPICH2 is provided in compressed format as mpich2.tar.gz http://www- unix.mcs.anl.gov/mpi/mpich2 • tar xfz mpich2.tar.gz cd /tmp/you/mpich2-1.0.5 /home/you/libraries/mpich2-1.0.5/configure \ -prefix=/home/you/mpich2-install |& tee configure.log docsity.com Cont… • To make the file we use command make |& tee make.log • Then install it using command line make install |& tee install.log To set path for the bin subdirectory that contains all the compilers export PATH=/home/you/mpich2- install/bin:$PATH docsity.com Confirmation pplications Actions ue Feb 13, 2:26AM Q taimoor®@taimoor File Edit View Terminal Tabs Help [taimoor@taimoor ~]$ which mpd ~/mpich2-install/bin/mpd [taimoor@taimoor ~]$ which mpiexec ~/mpich2-install/bin/mpiexec [taimoor@taimoor ~]$ which mpicc ~/mpich2-install/bin/mpice [taimoor@taimoor ~]$ which mpirun ~/mpich2-install/bin/mpirun [taimoor@taimoor ~]$ docsity.com Cont… – The third issue is about the mpiexec connects to the local mpd. File system preserves the security of communication – All the connection messages are encrypted – No message is encrypted once connection is established. docsity.com How a new mpd join the existing • B temporarily accepts the connection. • B generates random number and send it to A • A concatenates the random number with the value of secretword in its .mpd.conf encrypts it and sent it to B • In the meanwhile, B encrypts the random number and concatenates it with its secretword • B rececives the encrypted value from A and compares it with its encrypted value • If the two matches A is allowed to join the ring otherwise connection is closed docsity.com Bringing mpd ring on a set of machines • Create a file consisting of a list of machine names, one per line • Name this file as mpd.hosts • These machine names will be used as target for ssh or rsh • Access them using command line ssh machineName date Or rsh machineName date docsity.com Running mpd • When a job is started with mpiexec without any specification the processes are started on the ring in a round-robin fashion until all the processes have been started • If the number of hosts are more than the number of processes then some hosts will be free docsity.com MPICH2 Directories • The installation of MPICH2 has three directories • Include and lib directories contain header and libraries necessary to compile MPI operations • The bin directory contains the process manager mpd and the MPI job manager mpiexec and all the other language relative compilers docsity.com mpiexec • mpiexec is the job launcher for MPI • Its affected with the following environment variables • MPIEXEC_TIMEOUT • MPIEXEC_PORT_RANGE • MPD_CON_EXT mpicc is used for linking and compiling of C++ programs docsity.com Time schedule 6th semester • Task1- project selection and understanding • Task2- study of sparse storage formats 7th semester • Task3: Study of storage formats • Task 4: Sequential implementation of these storage formats. • Task 5: Creating a small MPI Cluster 8th semester • Task 6: Parallel implementation • Task 7: Matrix-vector multiplication • Task 8: Performance Evaluation docsity.com References • Albert Y. Zomaya, J. Desharnais, A.Milli, J.Mullins,Y. Slimani, “Parallel and distributed computing handbook” , Mcgraw-Hill Series • http://www-unix.mcs.anl.gov/mpi/mpich2 • http://math.nist.gov/MatrixMarket/index.ht ml • http://mrccs.man.ac.uk/hpctec/courses/Intr oPVM/PVM_25.html docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved