Download Conjugate Gradient Methods for Solving Linear Systems and Optimization Problems and more Slides Computer Science in PDF only on Docsity! Conjugate Gradient Methods Docsity.com Optimization Problems • Solution of equations can be formulated as an optimization problem, e.g., density functional theory in electronic structure, conformation of proteins, etc • Minimization with constraints – operations research (linear programming, optimal conditions in management science, traveling salesman problem, etc) Docsity.com Bracketing and Search in 1D Bracket a minimum means that for given a < b < c, we have f(b) < f(a), and f(b) < f(c). There is a minimum in the interval (a,c). a b c Docsity.com How accurate can we locate a minimum? • Let b a minimum of function f(x),Taylor expanding around b, we have • The best we can do is when the second correction term reaches machine epsilon comparing to the function value, so 21( ) ( ) ( )( ) 2 f x f b f b x b 2 2 | ( ) | | | | | ( ) f b x b b b f b Docsity.com Golden Section Search • Choose x such that the ratio of intervals [a,b] to [b,c] is the same as [a,x] to [x,b]. Remove [a,x] if f[x] > f[b], or remove [b,c] if f[x] < f[b]. • The asymptotic limit of the ratio is the Golden mean a b c x 5 1 0.61803 2 Docsity.com Local Properties near Minimum • Let P be some point of interest which is at the origin x=0. Taylor expansion gives • Minimizing f is the same as solving the equation 2 , 1 ( ) ( ) 2 1 2 i i j i i ji i j T T f f f f x x x x x x c x P b x x A x A x b T for transpose of a matrix Docsity.com Search along Coordinate Directions Search minimum along x direction, followed by search minimum along y direction, and so on. Such method takes a very large number of steps to converge. The curved loops represent f(x,y) = const. Docsity.com Conjugate Condition n1 T A n2 = 0 Make a linear coordinate transformation, such that contour is circular and (search) vectors are orthogonal Docsity.com CG Method – Conjugate Gradient (CG) Method M. R. Hestenes & E. Stiefel, 1952 BiCG Method – Biconjugate Gradient (BiCG) Method C. Lanczos, 1952 D. A. H. Jacobs, 1981 C. F. Smith et al., 1990 R. Barret et al., 1994 CGS Method – Conjugate Gradient Squared (CGS) Method (MATLAB Function) P. Sonneveld, 1989 GMRES Method – Generalized Minimal – Residual (GMRES) Method Y. Saad & M. H. Schultz, 1986 R. Barret et al., 1994 Y. Saad, 1996 QMR Method – Quasi–Minimal–Residual (QMR) Method R. Freund & N. Nachtigal, 1990 N. Nachtigal, 1991 R. Barret et al., 1994 Y. Saad, 1996 Docsity.com cgs Conjugate Gradients Squared method Syntaxx = cgs(A,b) cgs(A,b,tol) cgs(A,b,tol,maxit) cgs(A,b,tol,maxit,M) cgs(A,b,tol,maxit,M1,M2) cgs(A,b,tol,maxit,M1,M2,x0) cgs(afun,b,tol,maxit,m1fun,m2fun,x0,p1,p2,...) [x,flag] = cgs(A,b,...) [x,flag,relres] = cgs(A,b,...) [x,flag,relres,iter] = cgs(A,b,...) [x,flag,relres,iter,resvec] = cgs(A,b,...) Docsity.com Conjugate Gradient Method 1. Introduction, Notation and Basic Terms 2. Eigenvalues and Eigenvectors 3. The Method of Steepest Descent 4. Convergence Analysis of the Method of Steepest Descent 5. The Method of Conjugate Directions 6. The Method of Conjugate Gradients 7. Convergence Analysis of Conjugate Gradient Method 8. Complexity of the Conjugate Gradient Method 9. Preconditioning Techniques 10. Conjugate Gradient Type Algorithms for Non- Symmetric Matrices (CGS, Bi-CGSTAB Method) Docsity.com 1. Introduction, Notation and Basic Terms (Cont’d) • A matrix is a rectangular array of numbers, called elements. • The transpose of an mn matrix A is the nm matrix AT with elements aij T = aji • Two square nn matrices A and B are similar if there is a nonsin- gular matrix S such that B=S-1AS • Matrices having a single row are called row vectors; matrices having a single column are called column vectors. Row vector: a = [a1, a2, …, an] Column vector: a = (a1, a2, …, an) • The inner product of two vectors is written as n i ii 1 T yxyx Docsity.com 1. Introduction, Notation and Basic Terms (Cont’d) • A matrix is called symmetric if aij = aji . • A matrix is positive definite if, for every nonzero vector x xTAx > 0 • Basic matrix identities: (AB)T=BTAT and (AB)-1=B-1A-1 • A quadratic form is a scalar, quadratic function of a vector with the form • The gradient of a quadratic form is c 2 1 )f( TT xbAxx x bAxxA x x x 2 1 2 1 )f( x )f( x )(f T n 1 ' Docsity.com 1. Introduction, Notation and Basic Terms (Cont’d) • Various quadratic forms for an arbitrary 22 matrix: Positive-definite matrix Negative-definite matrix Singular positive-indefinite matrix Indefinite matrix Docsity.com 2. Eigenvalues and Eigenvectors (Cont’d) =-0.5 2 • Examples for ||<1 and || >1 • The spectral radius of a matrix is: r(B)= max|i| 1=0.7 2=-2 Docsity.com 2. Eigenvalues and Eigenvectors (Cont’d) • Example: The Jacobi Method for the solution of Ax=b - split the matrix A such that A=D+E - Instead of solving the system Ax=b, one solves the modifi- ed system x=Bx+z, where B=-D-1E and z=D-1b - The iterative method is: x(i+1)= Bx(i)+z, or it terms of the error vector: e(i+1)= Be(i) - For our particular example, we have that the eigenvalues and the eigenvectors of the matrix A are: 1=7, v1=[1, 2] T 2=2, v2=[-2, 1] T - The eigenvalues and eigenvectors of the matrix B are: 1=-2/3=-0.47, v1=[2, 1] T 2= 2/3=0.47, v2=[-2, 1] T Docsity.com • Graphical representation of the convergence of the Jacobi method 2. Eigenvalues and Eigenvectors (Cont’d) Eigenvectors of B together with their eigenvalues Convergence of the Jacobi method which starts at [-2,-2]T and converges to [2, -2]T The error vector e(0) The error vector e(1) The error vector e(2) Docsity.com • Geometrical representation (e) (f) 3. The Method of Steepest Descent (Cont’d) Docsity.com 3. The Method of Steepest Descent (Cont’d) • The algorithm • To avoid one matrix-vector multiplication, one uses (i)(i)(i))1(i(i)(i)(i))1(i (i) (i) (i) (i)(i) (i) (i) ree rxx Arr rr Axbr T T (i)(i)(i))1(i Arrr Two matrix-vector multiplications are required. The disadvantage of using this recurrence is that the residual sequence is determined without any feedback from the value of x(i), so that round-off errors may cause x(i) to converge to some point near x. Docsity.com 3. The Method of Steepest Descent (Cont’d) • Summary of the algorithm • To avoid one matrix-vector multiplication, one uses instead (i)(i)(i))1(i Arrr (i)(i)(i))1(i (i) (i) (i) (i)(i) (i) (i) rxx Arr rr Axbr T T • Efficient implementation of the method of steepest descent 1ii endif - else - 50le by is divisibi If andiiWhile 0i T T 0 2 max T rr q r r Ax br rxx qr Arq rr Axbr Docsity.com 4. Convergence Analysis of the Method of Steepest Descent (Cont’d) The influence of the spectral condition number on the convergence Worst-case scenario: =± Docsity.com Steepest Descent Problem • The gradient at the minimum of a line search is orthogonal to the direction of that search the steepest descent algorithm tends to make right angle turns, taking many steps down a long narrow potential well. Too many steps to get to a simple minimum. Docsity.com Steepest Descent Search in the direction with the largest decrease, i.e., n = -f Constant f contour line (surface) is perpendicular to n, because df = dxf = 0. The current search direction n and next search direction are orthogonal, because for minimum we have y’() = df(P+n)/d = nT f|P+n = 0 n n’ nT n’ = 0 Docsity.com 5. The Method of Conjugate Directions (Cont’d) • The new requirement is now that e(i+1) is A-orthogonal to d(i) (i) (i) )i( )i()i()i( T )i( )1i( T )i( (i) T )1(i )1(iT )1(i)1(i (i) (i) 0 0 0 0 d d )('f)f( d d Add rd deAd Aed dr x xx T T If the search vectors were the residuals, this formula would be identical to the method of steepest descent. Docsity.com 5. The Method of Conjugate Directions (Cont’d) • Proof that this process computes x in n-steps - express the error terms as a linear combination of the search directions - use the fact that the search directions are A-orthogonal )j( 1n 0j )j()0( de Docsity.com 5. The Method of Conjugate Directions (Cont’d) • Calculation of the A-ortogonal search directions by a conjugate Gram-Schmidth process 1. Take a set of linearly independent vectors u0, u1, … , un-1 2. Assume that d(0)=u0 3. For i>0, take an ui and subtracts all the components from it that are not A-orthogonal to the previous search directions )j( T (j) )j( T (i) ij)j( 1i 0j ij)i()i( , Add Adu dud Docsity.com 6. The Method of Conjugate Gradients • The method of Conjugate Gradients is simply the method of conjugate directions where the search directions are constructed by conjugation of the residuals, i.e. ui=r(i) • This allows us to simplify the calculation of the new search direction because • The new search direction is determined as a linear combination of the previous search direction and the new residual 1ji0 1ji 1 )1i( T )1i( )i( T )i( )1i( T )1i( )i( T )i( )1i(ij rr rr Add rr (i)i)1i()1(i drd Docsity.com 6. The Method of Conjugate Gradients (Cont’d) • Efficient implementation 1ii endif - else - 50y ivisible b If i is d andiiWhile 0i old new T new newold T new 0 2 newmax new0 T new dr d rr q r r Ax br dxx qd Adq rr rd Axbr Docsity.com 7. Convergence Analysis of the CG Method • If the algorithm is performed in exact arithmetic, the exact solution is obtained in at most n-steps. • When finite precision arithmetic is used, rounding errors lead to gradual loss of orthogonality among the residuals, and the finite termination property of the method is lost. • If the matrix A has only m distinct eigenvalues, then the CG will converge in at most m iterations • An error bound on the CG method can be obtained in terms of the A-norm, and after k-iterations: Docsity.com Conjugate gradient iteration • One matrix-vector multiplication per iteration • Two vector dot products per iteration • Four n-vectors of working storage x0 = 0, r0 = b, d0 = r0 for k = 1, 2, 3, . . . αk = (r T k-1rk-1) / (d T k-1Adk-1) step length xk = xk-1 + αk dk-1 approx solution rk = rk-1 – αk Adk-1 residual βk = (r T k rk) / (r T k-1rk-1) improvement dk = rk + βk dk-1 search direction Docsity.com Complexity of linear solvers 2D 3D Sparse Cholesky: O(n1.5 ) O(n2 ) CG, exact arithmetic: O(n2 ) O(n2 ) CG, no precond: O(n1.5 ) O(n1.33 ) CG, modified IC: O(n1.25 ) O(n1.17 ) CG, support trees: O(n1.20 ) -> O(n1+ ) O(n1.75 ) -> O(n1.31 ) Multigrid: O(n) O(n) n1/2 n1/3 Time to solve model problem (Poisson’s equation) on regular mesh Docsity.com 9. Preconditioning Techniques References: • J.A. Meijerink and H.A. van der Vorst, “An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix,” Mathematics of Computation, Vol. 31, No. 137, pp. 148-162 (1977). • J.A. Meijerink and H.A. van der Vorst, “Guidelines for the usage of incomplete decompositions in solving sets of linear equations as they occur in practical problems,” Journal of Computational Physics, Vol. 44, pp. 134-155 (1981). • D.S. Kershaw, “The incomplete Cholesky-Conjugate gradient method for the iterative solution of systems of linear equations,” Journal of Computational Physics, Vol. 26, pp. 43-65 (1978). • H.A. van der Vorst, “High performance preconditioning,” SIAM Journal of Scientific Statistical Computations, Vol. 10, No. 6, pp. 1174-1185 (1989). Docsity.com 9. Preconditioning Techniques (Cont’d) There are sevaral types of preconditioners that can be used: (a) Preconditioners based on splitting of the matrix A, i.e. A = M - N (b) Complete or incomplete factorization of A, e.g. A = LLT + E (c) Approximation of M=A-1 (d) Reordering of the equations and/or unknowns, e.g. Domain Decomposition Docsity.com 9. Preconditioning Techniques (Cont’d) (a) Preconditioning based on splittings of A • Diagonal Preconditioning: M=D=diag{a11, a22, … ann} • Tridiagonal Preconditioning • Block Diagonal Preconditioning 2 2 0c 8 2 62 23 x b A ,, Docsity.com 9. Preconditioning Techniques (Cont’d) (b) Preconditioning based on factorization of A • Cholesky Factorization (for symmetric and positive definite matrix) as a direct method: A = LLT x = (L-T)(L-1b) where: • Modified Cholesky factorization: A = LDLT niij L LLA L LAL ii i k ikjkji ji i k ikiiii ,,2,1 , 1 1 1 1 Docsity.com 9. Preconditioning Techniques (Cont’d) (d) Domain Decomposition Preconditioning Domain I Domain II SBB 0M0 00M L L S00 0M0 00M LM TT T- - where 21 2 1 2 1 1 1 1 , f d d z x x QBB BA0 B0A bAx 2 1 2 1 22 11 21 , TT 2 1 1 1* * 22 11 2211 21 , BMBBMBSS SBB BM0 B0M M -T-T TT Docsity.com 10. Conjugate Gradient Type Algorithms for Nonsymmetric Matrices • The Bi-Conjugate Gradient (BCG) Algorithm was propo- sed by Lanczos in 1954. • It solves not only the original system Ax=b but also the dual linear system ATx*=b*. • Each step of this algorithm requires a matrix-by-vector product with both A and AT. • The search direction pj * does not contribute to the solution directly. 1 ,, ,0 *** * * * 0 2 max 0 ii - - andiiWhile , i old new T new newold *T new T* new new T new p rp pr p rr q rr q r r pxx qp pAq Apq rr rp r,p 0r*rAxbr ** *** T Docsity.com 10. Conjugate Gradient Type Algorithms for Nonsymmetric Matrices (Cont’d) • The Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) Algorithm was developed by van der Vorst in 1992. • Rather than producing iterates whose residual vectors are of the form • it produces iterates with residual vectors of the form 1 ( , arbitrary is,0 * 0 * 0 0 2 max 0 * 0 * 000 0 ii ) - andiiWhile , i old new T new newold T T T new new new T new Appr p rr As sr spxx AsAs Ass Aprs Apr rr rp rAxbr 0 2 )( rAr j j 0 0 )()( )()( rAAp rAAr jjj jjj Docsity.com