Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Lecture Notes on linear System Theory, Lecture notes of Informatics Engineering

Brescia University Informatics Engineering

Linear system Theory in define linear spaces, linear maps, linear maps generated by matrix, varying and invariant linear system and inner product spaces.

Typology: Lecture notes

2021/2022

Uploaded on 03/31/2022

explain 🇺🇸

(2)

3 documents

1 / 167

Partial preview of the text

Download Lecture Notes on linear System Theory and more Lecture notes Informatics Engineering in PDF only on Docsity! Lecture Notes on Linear System Theory John Lygeros∗ and Federico A. Ramponi† ∗Automatic Control Laboratory, ETH Zurich CH-8092, Zurich, Switzerland lygeros@control.ee.ethz.ch †Department of Information Engineering, University of Brescia Via Branze 38, 25123, Brescia, Italy federico.ramponi@unibs.it January 3, 2015 Contents 1 Introduction 1 1.1 Objectives of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Proof methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Functions and maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Introduction to Algebra 11 2.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Rings and fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Subspaces and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Linear maps generated by matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Matrix representation of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.8 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Introduction to Analysis 33 3.1 Norms and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Equivalent norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Infinite-dimensional normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Induced norms and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 Ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7 Existence and uniqueness of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.7.1 Background lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.7.2 Proof of existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7.3 Proof of uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 Time varying linear systems: Solutions 59 4.1 Motivation: Linearization about a trajectory . . . . . . . . . . . . . . . . . . . . . . 59 i Chapter 1 Introduction 1.1 Objectives of the course This course has two main objectives. The first (and more obvious) is for students to learn something about linear systems. Most of the course will be devoted to linear time varying systems that evolve in continuous time t ∈ R+. These are dynamical systems whose evolution is defined through state space equations of the form ẋ(t) = A(t)x(t) +B(t)u(t), y(t) = C(t)x(t) +D(t)u(t), where x(t) ∈ Rn denotes the system state, u(t) ∈ Rm denotes the system inputs, y(t) ∈ Rp denotes the system outputs, A(t) ∈ Rn×n, B(t) ∈ Rn×m, C(t) ∈ Rp×n, and D(t) ∈ Rp×m are matrices of appropriate dimensions, and where, as usual, ẋ(t) = dx dt (t) denotes the derivative of x(t) with respect to time. Time varying linear systems are useful in many application areas. They frequently arise as models of mechanical or electrical systems whose parameters (for example, the stiffness of a spring or the inductance of a coil) change in time. As we will see, time varying linear systems also arise when one linearizes a non-linear system around a trajectory. This is very common in practice. Faced with a nonlinear system one often uses the full nonlinear dynamics to design an optimal trajectory to guide the system from its initial state to a desired final state. However, ensuring that the system will actually track this trajectory in the presence of disturbances is not an easy task. One solution is to linearize the nonlinear system (i.e. approximate it by a linear system) around the optimal trajectory; the approximation is accurate as long as the nonlinear system does not drift too far away from the optimal trajectory. The result of the linearization is a time varying linear system, which can be controlled using the methods developed in this course. If the control design is done well, the state of the nonlinear system will always stay close to the optimal trajectory, hence ensuring that the linear approximation remains valid. A special class of linear time varying systems are linear time invariant systems, usually referred to by the acronym LTI. LTI systems are described by state equations of the form ẋ(t) = Ax(t) +Bu(t), y(t) = Cx(t) +Du(t), where the matrices A ∈ Rn×n, B ∈ Rn×m, C ∈ Rp×n, and D ∈ Rp×m are constant for all times t ∈ R+. LTI systems are somewhat easier to deal with and will be treated in the course as a special case of the more general linear time varying systems. 1 Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 2 The second and less obvious objective of the course is for students to experience something about doing automatic control research, in particular developing mathematical proofs and formal logical arguments. Linear systems are ideally suited for this task. There are two main reasons for this. The first is that almost all the derivations given in the class can be carried out in complete detail, down to the level of basic algebra. There are very few places where one has to invoke “higher powers”, such as an obscure mathematical theorem whose proof is outside the scope of the course. One can generally continue the calculations until he/she is convinced that the claim is true. The second reason is that linear systems theory brings together two areas of mathematics, algebra and analysis. As we will soon see, the state space, Rn, of the systems has both an algebraic structure (it is a vector space) and a topological structure (it is a normed space). The algebraic structure allows us to perform linear algebra operations, compute projections, eigenvalues, etc. The topological structure, on the other hand, forms the basis of analysis, the definition of derivatives, etc. The main point of linear systems theory is to exploit the algebraic structure to develop tractable “algorithms” that allow us to answer analysis questions which appear intractable by themselves. For example, consider the time invariant linear system ẋ(t) = Ax(t) +Bu(t) (1.1) with x(t) ∈ Rn, u(t) ∈ Rm, A ∈ Rn×n, and B ∈ Rn×m. Given x0 ∈ Rn, T > 0 and a continuous function u(·) : [0, T ] → Rm (known as the input trajectory) one can show (Chapter 3) that there exists a unique function x(·) : [0, T ]→ Rn such that x(0) = x0 and ẋ(t) = Ax(t) +Bu(t), for all t ∈ [0, T ]. (1.2) This function is called the state trajectory (or simply the solution) of system (1.1) with initial condition x0 under the input u(·). As we will see in Chapter 3, u(·) does not even need to be continuous for (1.2) to be true, provided one appropriately qualifies the statement “for all t ∈ [0, T ]”. System (1.1) is called controllable (Chapter 8) if and only if for all x0 ∈ Rn, for all x̂ ∈ Rn, and for all T > 0, there exists u(·) : [0, T ]→ Rm such that the solution of system (1.1) with initial condition x0 under the input u(·) is such that x(T ) = x̂. Controllability is clearly an interesting property for a system to have. If the system is controllable then we can guide it from any initial state to any final state by selecting an appropriate input. If not, there may be some desirable parts of the state space that we cannot reach from some initial states. Unfortunately, determining whether a system is controllable directly from the definition is impossible. This would require calculating all trajectories that start at all initial conditions. Except for trivial cases (like the linear system ẋ(t) = u(t)) this calculation is intractable, since the initial states, x0, the times T of interest, and the possible input trajectories u(·) : [0, T ]→ Rm are all infinite. Fortunately, linear algebra can be used to answer the question without even computing a single solution (Chapter 8). Theorem 1.1 System (1.1) is controllable if and only if the matrix [ B AB . . . An−1B ] ∈ Rn×nm has rank n. The theorem shows how the seemingly intractable analysis question “is the system (1.1) control- lable?” can be answered by a simple algebraic calculation of the rank of a matrix. The treatment in these notes is inspired by [6] in terms of the level of mathematical rigour and at places the notation and conventions. Coverage and style of presentation of course differ substantially. There are many good reference books for linear systems theory, including [5, 1, 2, 9] and, primarily for linear time invariant systems, [11]. 1.2 Proof methods Most of the course will be devoted to proving theorems. The proof methods that we will encounter are just a set of tools, grounded in mathematical logic and widely accepted in the mathematical Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 3 community, that let us say that a proposition is true, given that others are true. A “Theorem” is indeed a logical statement that can be proven: This means that the truth of such statement can be established by applying our proof methods to other statements that we already accept as true, either because they have been proven before, or because we postulate so (for example the “axioms” of logic), or because we assume so in a certain context (for example, when we say “Let V be a vector space . . . ” we mean “Assume that the set V verifies the axioms of a vector space . . . ”). Theorems of minor importance, or theorems whose main point is to establish an intermediate step in the proof of another theorem, will be called “Lemmas”, “Facts”, or “Propositions”; An immediate consequence of a theorem that deserves to be highlighted separately is usually called a “Corollary”. And a logical statement that we think may be true but cannot prove so is called a “Conjecture”. The logical statements we will most be interested in typically take the form p⇒ q (p implies q). p is called the hypothesis and q the consequence. Example (No smoke without fire) It is generally accepted that when there is smoke, there must be some a fire somewhere. This knowledge can be encoded by the logical implication If there is smoke then there is a fire p ⇒ q. This is a statement of the form p ⇒ q with p the statement “there is smoke” and q the statement “there is a fire”. Hypotheses and consequences may typically depend on one or more free variables, that is, objects that in the formulation of hypotheses and consequences are left free to change. Example (Greeks) Despite recent economic turbulence, it is generally accepted that Greek citizens are also Europeans. This knowledge can be encoded by the logical implication If X is a Greek then X is a European p(X) ⇒ q(X). A sentence like “X is a . . . ” is the verbal way of saying something belongs to a set; for example the above statement can also be written as X ∈ Greeks⇒ X ∈ Europeans, where “Greeks” and “Europeans” are supposed to be sets; the assertion that this implication is true for arbitrary X (∀X, X ∈ Greeks⇒ X ∈ Europeans) is equivalent to the set-theoretic statement of inclusion: Greeks ⊆ Europeans. You can visualize the implication and its set-theoretic interpretation in Figure 1.1. There are several ways of proving that logical statements are true. The most obvious one is a direct proof: Start from p and establish a finite sequence of intermediate implications, p1, p2, . . . , pn leading to q p⇒ p1 ⇒ p2 ⇒ . . .⇒ pn ⇒ q. We illustrate this proof technique using a statement about the natural numbers. Definition 1.1 A natural number n ∈ N is called odd if and only if there exists k ∈ N such that n = 2k + 1. It is called even if and only if there exists k ∈ N such that n = 2k. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 6 Another common method that can be used to indirectly prove that p ⇒ q is to suppose that p is true, to suppose that q is false, and to apply other proof methods to derive a contradiction. A contradiction is a proposition of the form r ∧ ¬r (like “There is smoke and there is no smoke”, or “n is even and n is odd”); all such statements are postulated to be false by virtue of their mere structure, and irrespective of the proposition r. If, by assuming p is true and q is false we are able to reach a false assertion, we must admit that if p is true the consequence q cannot be false, in other words that p implies q. This method is known as proof by contradiction. Example (Greeks and Chinese) Suppose the following implications: for all X , X is a Greek⇒ X is a European X is a Chinese⇒ X is an Asian X is an Asian⇒ X is not a European We show by contradiction that every Greek is not a Chinese, more formally If X is a Greek then X is not a Chinese p(X) ⇒ q(X) Indeed, suppose p(X) and the converse of q(X), that is, X is a Chinese. By direct deduction, X is a Greek ∧X is a Chinese ⇓ X is a European ∧X is an Asian ⇓ X is a European ∧X is not a European Since the conclusion is a contradiction for all X , we must admit that p(X)∧¬q(X) is false or, which is the same, that p(X)⇒ q(X). The set-theoretic interpretation is as follows: By postulate, Europeans ∩ non-Europeans = ∅ On the other hand, by deduction, (Greeks ∩ Chinese) ⊆ (Europeans ∩ non-Europeans) It follows that Greeks ∩ Chinese is also equal to the empty set. Therefore (here is the point of the above proof), Greeks ⊆ non-Chinese. Exercise 1.2 Visualize this set theoretic interpretation by a picture similar to Figure 1.1. We will illustrate this fundamental proof technique with another statement, about rational numbers. Definition 1.3 The real number x ∈ R is called rational if and only if there exist integers n,m ∈ Z with m 6= 0 such that x = n/m. Theorem 1.5 (Pythagoras) √ 2 is not rational. Proof: (Euclid) Assume, for the sake of contradiction, that √ 2 is rational. Then there exist n,m ∈ Z with m 6= 0 such that √ 2 = n/m. Since √ 2 > 0, without loss of generality we can take n,m ∈ N; if they happen to be both negative multiply both by −1 and replace them by the resulting numbers. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 7 Without loss of generality, we can further assume that m and n have no common divisor; if they do, divide both by their common divisors until there are no common divisors left and replace m and n by the resulting numbers. Now √ 2 = n m ⇒ 2 = n2 m2 ⇒ n2 = 2m2 ⇒ n2 is even ⇒ n is even (Theorem 1.4 and Problem 1.1) ⇒ ∃k ∈ N : n = 2k ⇒ ∃k ∈ N : 2m2 = n2 = 4k2 ⇒ ∃k ∈ N : m2 = 2k2 ⇒ m2 is even ⇒ m is even (Theorem 1.4 and Problem 1.1). Therefore, n and m are both even and, according to Definition 1.1, 2 divides both. This contradicts the fact that n and m have no common divisor. Therefore √ 2 cannot be rational. Exercise 1.3 What is the statement p in Theorem 1.5? What is the statement q? What is the statement r in the logical contradiction r ∧ ¬r reached at the end of the proof? Two statements are equivalent if one implies the other and vice versa, (p⇔ q) is the same as (p⇒ q) ∧ (q ⇒ p) Usually showing that two statements are equivalent is done in two steps: Show that p⇒ q and then show that q ⇒ p. For example, consider the following statement about the natural numbers. Theorem 1.6 n2 is odd if and only if n is odd. Proof: n is odd implies that n2 is odd (by Theorem 1.2) and n2 is odd implies that n is odd (by Theorem 1.4). Therefore the two statements are equivalent. This is argument is related to the canonical way of proving that two sets are equal, by proving two set inclusions A ⊆ B and B ⊆ A. To prove these inclusions one proves two implications: X ∈ A⇒ X ∈ B X ∈ B ⇒ X ∈ A or, in other words, X ∈ A⇔ X ∈ B. Finally, let us close this brief discussion on proof techniques with a subtle caveat: If p is a false statement then any implication of the form p⇒ q is true, irrespective of what q is. Example (Maximal natural number) Here is a proof that there is no number larger than 1. Theorem 1.7 Let N ∈ N be the largest natural number. Then N = 1. Proof: Assume, for the sake of contradiction, that N > 1. Then N2 is also a natural number and N2 > N . This contradicts the fact that N is the largest natural number. Therefore we must have N = 1. Obviously the “theorem” in this example is saying something quite silly. The problem, however, is not that the proof is incorrect, but that the starting hypothesis “let N be the largest natural number” is false, since there is no largest natural number. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 8 X Y Z f g f ◦ g Figure 1.2: Commutative diagram of function composition. 1.3 Functions and maps A function f : X → Y maps the set X (known as the domain of f) into the set Y (known as the co-domain of f). This means that for all x ∈ X there exists a unique y ∈ Y such that f(x) = y. The element f(x) ∈ Y is known as the value of f at x. The set {y ∈ Y | ∃x ∈ X : f(x) = y} ⊆ Y is called the range of f (sometimes denoted by f(X)) and the set {(x, y) ∈ X × Y | y = f(x)} ⊆ X × Y is called the graph of f . Definition 1.4 A function f : X → Y is called: 1. Injective (or one-to-one) if and only if f(x1) = f(x2) implies that x1 = x2. 2. Surjective (or onto) if and only if for all y ∈ Y there exists x ∈ X such that y = f(x). 3. Bijective if and only if it is both injective and surjective, i.e. for all y ∈ Y there exists a unique x ∈ X such that y = f(x). Given two functions g : X → Y and f : Y → Z their composition is the function (f ◦ g) : X → Z defined by (f ◦ g)(x) = f(g(x)). Commutative diagrams help visualize function compositions (Figure 1.2). Exercise 1.4 Show that composition is associative. In other words, for any three functions g : X → Y , f : Y → Z and h :W → X and for all w ∈ W , f ◦ (g ◦ h)(w) = (f ◦ g) ◦ h(w). By virtue of this associativity property, we will simply use f◦g◦h :W → Y to denote the composition of three (or more) functions. A special function that can always be defined on any set is the identity function, also called the identity map, or simply the identity. Definition 1.5 The identity map on X is the function 1X : X → X defined by 1X(x) = x for all x ∈ X. Exercise 1.5 Show that the identity map is bijective. Using the identity map one can also define various inverses of functions. Chapter 2 Introduction to Algebra 2.1 Groups Definition 2.1 A group (G, ∗) is a set G equipped with a binary operation ∗ : G × G → G such that: 1. The operation ∗ is associative: ∀a, b, c ∈ G, a ∗ (b ∗ c) = (a ∗ b) ∗ c. 2. There exists an identity element: ∃e ∈ G, ∀a ∈ G, a ∗ e = e ∗ a = a. 3. Every element has an inverse element: ∀a ∈ G, ∃a−1 ∈ G, a ∗ a−1 = a−1 ∗ a = e. (G, ∗) is called commutative (or Abelian) if and only if in addition to 1-3 above 4. ∗ is commutative: ∀a, b ∈ G, a ∗ b = b ∗ a. Example (Common groups) (R,+) is a commutative group. What is the identity element? What is the inverse? (R, ·) is not a group, since 0 has no inverse. The set ({0, 1, 2},+mod3) is a group. What is the identity element? What is the inverse? Is it commutative? Recall that (0 + 1)mod3 = 1, (1 + 2)mod3 = 0, (2 + 2)mod3 = 1, etc. The set of rotations of R2 (usually denoted by SO(2), or U(1) or S(1)) given by ({[ cos(θ) − sin(θ) sin(θ) cos(θ) ]∣∣∣∣ θ ∈ (−π, π] } , · ) with the usual operation of matrix multiplication is a group1. What is the identity? What is the inverse? Fact 2.1 For a group (G, ∗) the identity element, e, is unique. Moreover, for all a ∈ G the inverse element, a−1, is unique. 1For the time being, the reader is asked to excuse the use of matrices in the examples. Matrices will be formally defined in the next section, but will be used in the meantime for informal illustrations. 11 Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 12 Proof: To show the first statement, assume, for the sake of contradiction, that there exist two identity elements e, e′ ∈ G with e 6= e′. Then for all a ∈ G, e ∗ a = a ∗ e = a and e′ ∗ a = a ∗ e′ = a. Then: e = e ∗ e′ = e′ which contradicts the assumption that e 6= e′. To show the second statement, assume, for the sake of contradiction, that there exists a ∈ G with two inverse elements, say a1 and a2 with a1 6= a2. Then a1 = a1 ∗ e = a1 ∗ (a ∗ a2) = (a1 ∗ a) ∗ a2 = e ∗ a2 = a2, which contradicts the assumption that a1 6= a2. 2.2 Rings and fields Definition 2.2 A ring (R,+, ·) is a set R equipped with two binary operations, + : R × R → R (called addition) and · : R×R→ R (called multiplication) such that: 1. Addition satisfies the following properties: • It is associative: ∀a, b, c ∈ R, a+ (b + c) = (a+ b) + c. • It is commutative: ∀a, b ∈ R, a+ b = b+ a. • There exists an identity element: ∃0 ∈ R, ∀a ∈ R, a+ 0 = a. • Every element has an inverse element: ∀a ∈ R, ∃(−a) ∈ R, a+ (−a) = 0. 2. Multiplication satisfies the following properties: • It is associative: ∀a, b, c ∈ R, a · (b · c) = (a · b) · c. • There exists an identity element: ∃1 ∈ R, ∀a ∈ R, 1 · a = a · 1 = a. 3. Multiplication is distributive with respect to addition: ∀a, b, c ∈ R, a · (b+ c) = a · b+ a · c and (b+ c) · a = b · a+ c · a. (R,+, ·) is called commutative if in addition ∀a, b ∈ R, a · b = b · a. Example (Common rings) (R,+, ·) is a commutative ring. (Rn×n,+, ·) with the usual operations of matrix addition and multiplication is a non-commutative ring. The set of rotations ({[ cos(θ) − sin(θ) sin(θ) cos(θ) ]∣∣∣∣ θ ∈ (−π, π] } ,+, · ) with the same operations is not a ring, since it is not closed under addition. (R[s],+, ·), the set of polynomials of s with real coefficients, i.e. ans n+an−1s n−1+ . . .+a0 for some n ∈ N and a0, . . . , an ∈ R is a commutative ring. (R(s),+, ·), the set of rational functions of s with real coefficients, i.e. ams m + am−1s m−1 + . . .+ a0 bnsn + bn−1sn−1 + . . .+ b0 Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 13 for some n,m ∈ N and a0, . . . , am, b0, . . . , bn ∈ R with bn 6= 0 is a commutative ring. We implicitly assume here that the numerator and denominator polynomials are co-prime, that is they do not have any common factors; if they do one can simply cancel these factors until the two polynomials are co-prime. For example, it is easy to see that with such cancellations any rational function of the form 0 bnsn + bn−1sn−1 + . . .+ b0 can be identified with the rational function 0/1, which is the identity element of addition for this ring. (Rp(s),+, ·), the set of proper rational functions of s with real coefficients, i.e. ans n + an−1s n−1 + . . .+ a0 bnsn + bn−1sn−1 + . . .+ b0 for some n ∈ N with a0, . . . , an, b0, . . . , bn ∈ R with bn 6= 0 is a commutative ring. Note that an = 0 is allowed, i.e. it is possible for the degree of the numerator polynomial to be less than or equal to that of the denominator polynomial. Exercise 2.1 Show that for every ring (R,+, ·) the identity elements 0 and 1 are unique. Moreover, for all a ∈ R the inverse element (−a) is unique. Fact 2.2 If (R,+, ·) is a ring then: 1. For all a ∈ R, a · 0 = 0 · a = 0. 2. For all a, b ∈ R, (−a) · b = −(a · b) = a · (−b). Proof: To show the first statement note that a+ 0 = a⇒ a · (a+ 0) = a · a⇒ a · a+ a · 0 = a · a ⇒ −(a · a) + a · a+ a · 0 = −(a · a) + a · a ⇒ 0 + a · 0 = 0⇒ a · 0 = 0. The second equation is similar. For the second statement note that 0 = 0 · b = (a+ (−a)) · b = a · b+ (−a) · b⇒ −(a · b) = (−a) · b. The second equation is again similar. Definition 2.3 A field (F,+, ·) is a commutative ring that in addition satisfies • Multiplication inverse: ∀a ∈ F with a 6= 0, ∃a−1 ∈ F , a · a−1 = 1. Example (Common fields) (R,+, ·) is a field. (Rn×n,+, ·) is not a field, since singular matrices have no inverse. ({A ∈ Rn×n | Det(A) 6= 0},+, ·) is not a field, since it is not closed under addition. The set of rotations ({[ cos(θ) − sin(θ) sin(θ) cos(θ) ]∣∣∣∣ θ ∈ (−π, π] } ,+.· ) is not a field, it is not even a ring. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 16 and ⊙ : F × Fn → Fn by a⊙ x = (a · x1, . . . , a · xn). Note that both operations are well defined since a, x1, . . . , xn, y1, . . . , yn all take values in the same field, F . Exercise 2.7 Show that (Fn, F,⊕,⊙) is a linear space. What is the identity element θ? What is the inverse element ⊖x of x ∈ Fn? The most important instance of this type of linear space in these notes will be (Rn,R,+, ·) with the usual addition and scalar multiplication for vectors. The state, input, and output spaces of linear systems will be linear spaces of this type. The second class of linear spaces that will play a key role in linear system theory are function spaces. Example (Function spaces) Let (V, F,⊕V ,⊙V ) be a linear space and D be any set. Let F(D,V ) denote the set of functions of the form f : D → V . Consider f, g ∈ F(D,V ) and a ∈ F and define ⊕ : F(D,V )×F(D,V )→ F(D,V ) by (f ⊕ g) : D → V such that (f ⊕ g)(d) = f(d)⊕V g(d) ∀d ∈ D and ⊙ : F ×F(D,V )→ F(D,V ) by (a⊙ f) : D → V such that (a⊙ f)(d) = a⊙V f(d) ∀d ∈ D Note that both operations are well defined since a ∈ F , f(d), g(d) ∈ V and (V, F,⊕V ,⊙V ) is a linear space. Exercise 2.8 Show that (F(D,V ), F,⊕,⊙) is a linear space. What is the identity element? What is the inverse element? The most important instance of this type of linear space in these notes will be (F([t0, t1],Rn),R,+, ·) for real numbers t0 < t1. The trajectories of the state, input, and output of the dynamical systems we consider will take values in linear spaces of this type. The state, input and output trajectories will differ in terms of their “smoothness” as functions of time. We will use the following notation to distinguish the level of smoothness of the function in question: • C([t0, t1],Rn) will be the linear space of continuous functions f : [t0, t1]→ Rn. • C1([t0, t1],R n) will be the linear space of differentiable functions f : [t0, t1]→ Rn. • Ck([t0, t1],R n) will be the linear space of k-times differentiable functions f : [t0, t1]→ Rn. • C∞([t0, t1],R n) will be the linear space of infinitely differentiable functions f : [t0, t1]→ Rn. • Cω([t0, t1],R n) will be the linear space of analytic functions f : [t0, t1] → Rn, i.e. functions which are infinitely differentiable and whose Taylor series expansion converges for all t ∈ [t0, t1]. Exercise 2.9 Show that all of these sets are linear spaces. You only need to check that they are closed under addition and scalar multiplication. E.g. if f and g are differentiable, then so is f ⊕ g. Exercise 2.10 Show that for all k = 2, 3, . . . Cω([t0, t1],R n) ⊂ C∞([t0, t1],R n) ⊂ Ck([t0, t1],R n) ⊂ Ck−1([t0, t1],R n) ⊂ C([t0, t1],Rn) ⊂ (F([t0, t1],Rn),R,+, ·). Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 17 Note that all subsets are strict, so there must be functions that belong to one set but not the previous one. Try to think of examples. To simplify the notation, unless there is special reason to distinguish the operations and identity element of a linear space from those of the field, from now on we will use the regular symbols + and · instead of ⊕ and ⊙ for the linear space operations of vector addition and scalar multiplication respectively; in fact as for real numbers we will mostly ommit · and simply write av instead of a⊙ v for a ∈ F , v ∈ V . Likewise, unless explicitly needed we will also use 0 instead of θ to denote the identity element of addition. Finally we will stop writing the operations explicitly when we define the vector space and write (V, F ) or simply V whenever the field is clear from the context. 2.4 Subspaces and bases Definition 2.6 Let (V, F ) be a linear space andW ⊆ V . (W,F ) is a linear subspace of V if and only if it is itself a linear space, i.e. for all w1, w2 ∈W and all a1, a2 ∈ F , we have that a1w1+a2w2 ∈ W . Note that by definition a linear space and all its subspaces are linear spaces over the same field. The equation provides a way of testing whether a given set W is a subspace or not: One needs to check whether linear combinations of elements of W with coefficients in F are also elements of W . Exercise 2.11 Show that if W is a subspace then for all n ∈ N, and ai ∈ F , wi ∈W for i = 1, . . . , n n∑ i=1 aiwi ∈W. Show further that θV ∈ W . Hence show that θW = θV . Example (Linear subspaces) In R2, the set {(x1, x2) ∈ R2 | x1 = 0} is subspace. So is the set {(x1, x2) ∈ R2 | x1 = x2}. But the set {(x1, x2) ∈ R2 | x2 = x1 + 1} is not a subspace and neither is the set {(x1, x2) ∈ R2 | (x1 = 0) ∨ (x2 = 0)}. In R3, all subspaces are: 1. R3 2. 2D planes through the origin. 3. 1D lines through the origin. 4. {0}. For examples of subspace of function spaces consider (R[t],R) (polynomials of t ∈ R with real coefficients). This is a linear subspace of C∞(R,R), which in turn is a linear subspace of C(R,R). The set {f : R→ R | ∀t ∈ R, |f(t)| ≤ 1} on the other hand is not a subspace of F(R,R). Exercise 2.12 Show that {f : R → R | ∀t ∈ R, |f(t)| ≤ 1} is not a subspace. How about {f : R→ R | ∃M > 0, ∀t ∈ R, |f(t)| ≤M}? It is easy to see that the family of subspaces of a given a linear space is closed under finite addition and intersection. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 18 Exercise 2.13 Let {(Wi, F )}ni=1 be a finite family of subspaces of a linear space (V, F ). Show that (∩ni=1Wi, F ) is also a subspace. Is (∪ni=1Wi, F ) a subspace? Exercise 2.14 Let (W1, F ) and (W2, F ) be subspaces of (V, F ) and define W1 +W2 = {w1 + w2 | w1 ∈W1, w2 ∈ W2}. Show that (W1 +W2, F ) is a subspace of (V, F ). A subset of a linear space will of course not be a subspace in general. Each subset of a linear space does, however, generate a subspace in a natural way. Definition 2.7 Let (V, F ) be a linear space and S ⊆ V . The linear subspace of (V, F ) generated by S is the smallest subspace of (V, F ) containing S. Here, “smallest” is to be understood in the sense of set inclusion. Exercise 2.15 What is the subspace generated by {(x1, x2) ∈ R2 | (x1 = 0) ∨ (x2 = 0)}? What is the subspace generated by {(x1, x2) ∈ R2 | x2 = x1 + 1}? What is the subspace of R2 generated by {(1, 2)}? Definition 2.8 Let (V, F ) be a linear space and S ⊆ V . The span of S is the set defined by Span(S) = { n∑ i=1 aivi ∣∣ n ∈ N, ai ∈ F, vi ∈ S, i = 1, . . . , n } . Fact 2.5 Let (V, F ) be a linear space and S ⊆ V . The subspace generated by S coincides with Span(S). Proof: The fact that Span(S) is a subspace and contains S is easy to check. To show that is is the smallest subspace containing S, consider another subspace, W , that contains S and an arbitrary v ∈ Span(S); we will show that v ∈ W and hence Span(S) ⊆ W . Since v ∈ Span(S) it can be written as v = n∑ i=1 aivi for some n ∈ N and ai ∈ F , vi ∈ S, i = 1, . . . , n. Since S ⊆ W we must also have vi ∈ W , i = 1, . . . , n and hence v ∈ W (since W is a subspace). The elements of Span(S) are known as linear combinations of elements of S. Notice that in general the set S may contain an infinite number of elements; this was for example the case for the set {(x1, x2) ∈ R2 | (x1 = 0) ∨ (x2 = 0)} in Exercise 2.15. The span of S, however, is defined as the set of all finite linear combinations of elements of S. Definition 2.9 Let (V, F ) be a linear space. A set S ⊆ V is called linearly independent if and only if for all n ∈ N, vi ∈ S for i = 1, . . . , n with vi 6= vj if i 6= j, n∑ i=1 aivi = 0⇔ ai = 0, ∀i = 1, . . . , n. A set which is not linearly independent is called linearly dependent. Note again that the set S may be infinite, but we only consider finite linear combinations to define linear independence. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 21 The linear space F([−1, 1],R) is infinite dimensional. We have already shown that the collection {tk | k ∈ N} ⊆ F([−1, 1],R) is linearly independent. The collection contains an infinite number of elements and may or may not span F([−1, 1],R). Therefore any basis of F([−1, 1],R) (which must by definition span the set) must contain at least as many elements. Let {b1, b2, . . . , bn} be a basis of a finite dimensional linear space (V, F ). By definition, Span({b1, . . . , bn}) = V therefore ∀x ∈ V ∃ξ1, . . . , ξn ∈ F : x = n∑ i=1 ξibi. The vector ξ = (ξ1, . . . , ξn) ∈ Fn is called the representation of x ∈ V with respect to the basis {b1, b2, . . . , bn}. Fact 2.8 The representation of a given x ∈ V with respect to a basis {b1, . . . , bn} is unique. The proof is left as an exercise (Problem 2.6). Representations of the same vector with respect to different bases can of course be different. Example (Representations) Let x = (x1, x2, x3) ∈ (R3,R). The representation of x with respect to the canonical basis is simply ξ = (x1, x2, x3). The representation with respect to the basis {(1, 1, 0), (0, 1, 0), (0, 1, 1)}, however, is ξ′ = (x1, x2 − x1 − x3, x3) since x = x1   1 0 0  + x2   0 1 0  + x3   0 0 1   = x1   1 1 0  + (x2 − x1 − x3)   0 1 0  + x3   0 1 1   . Representations can also be defined for infinite dimensional spaces, but we will not get into the details here. As an example, consider f(t) ∈ Cω([−1, 1],R). One can consider a “representation” of f(t) with respect to the basis {tk | k ∈ N} defined through the Taylor series expansion. For example, expansion about t = 0 gives f(t) = f(0) + df dt (0)t+ 1 2 d2f dt2 (0)t2 + . . . . which suggests that the representation of f(t) is ξ = (f(0), dfdt (0), 1 2 d2f dt2 (0), . . .). Making this state- ment formal, however, is beyond the scope of these notes. It turns out that all representations of an element of a linear space are related to one another: Knowing one we can compute all others. To do this we need the concept of linear maps. 2.5 Linear maps Definition 2.12 Let (U, F ) and (V, F ) be two linear spaces. The function A : U → V is called linear if and only if ∀u1, u2 ∈ U , a1, a2 ∈ F A(a1u1 + a2u2) = a1A(u1) + a2A(u2). Note that both linear spaces have to be defined over the same field. For clarity we will sometimes write A : (U, F )→ (V, F ) if we need to specify the field over which the linear spaces are defined. Example (Linear maps) Let (U, F ) = (Rn,R), (V, F ) = (Rm,R) and consider a matrix A ∈ Rm×n. Define A : Rn → Rm u 7→ A · u. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 22 It is easy to show that A is a linear map. Indeed: A(a1u1 + a2u2) = A · (a1u1 + a2u2) = a1A · u1 + a2A · u2 = a1A(u1) + a2A(u2). Consider now f ∈ C([0, 1],R) and define the functions A : (C([0, 1],R),R) → (R,R) f 7→ ∫ 1 0 f(t)dt (integration) A′ : (C([0, 1],R),R) → (C([0, 1],R),R) f 7→ g(t) = ∫ t 0 e−a(t−τ)f(τ)dτ (convolution with e−at). Exercise 2.19 Show that the functions A and A′ are both linear. It is easy to see that linear maps map the zero element of their domain to the zero element of their co-domain. Exercise 2.20 Show that if A : U → V is linear then A(θU ) = θV . Other elements of the domain may also be mapped to the zero element of the co-domain, however. Definition 2.13 Let A : U → V linear. The null space of A is the set Null(A) = {u ∈ U | A(u) = θV } ⊆ U and the range space of A is the set Range(A) = {v ∈ V | ∃u ∈ U : v = A(u)} ⊆ V. The word “space” in “null space” and “range space” is of course not accidental. Fact 2.9 Show that Null(A) is a linear subspace of (U, F ) and Range(A) is a linear subspace of (V, F ). The proof is left as an exercise (Problem 2.5). It is easy to see that the properties of the null and range spaces are closely related to the injectivity and surjectivity of the corresponding linear map, and hence its invertibility (Problem 1.2). Theorem 2.1 Let A : U → V be a linear map and let b ∈ V . 1. A vector u ∈ U such that A(u) = b exists if and only if b ∈ Range(A). In particular, A is surjective if and only if Range(A) = V . 2. If b ∈ Range(A) and for some u0 ∈ U we have that A(u0) = b then for all u ∈ U : A(u) = b⇔ u = u0 + z with z ∈ Null(A). 3. If b ∈ Range(A) there exists a unique u ∈ U such that A(u) = b if and only if Null(A) = {θU}. In other words, A is injective if and only if Null(A) = {θU}. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 23 Proof: Part 1 follows by definition. Part 2, (⇐): u = u0 + z ⇒ A(u) = A(u0 + z) = A(u0) +A(z) = A(u0) = b. Part 2, (⇒): A(u) = A(u0) = b⇒ A(u − u0) = θV ⇒ z = (u − u0) ∈ Null(A). Part 3 follows from part 2. Finally, we generalise the concept of an eigenvalue to more general linear maps of a (potentially infinite dimensional) linear space. Definition 2.14 Let (V, F ) be a linear space and consider a linear map A : V → V . An element λ ∈ F is called an eigenvalue of A if and only if there exists v ∈ V such that v 6= θV and A(v) = λ·v. In this case, v is called an eigenvector of A for the eigenvalue λ. Example (Eigenvalues) For maps between finite dimensional spaces defined by matrices, the in- terpretation of eigenvalues is the familiar one from linear algebra. Since eigenvalues and eigenvectors are in general complex numbers/vectors we consider matrices as linear maps between complex finite dimensional spaces (even if the entries of the matrix itself are real). For example, consider the linear space (C2,C) and the linear map A : C2 → C2 defined by the matrix A = [ 0 1 −1 0 ] through matrix multiplication; in other words, for all x ∈ C2, A(x) = Ax. It is easy to see that the eigenvalues of A are λ1 = j and λ2 = −j. Moreover, any vector of the form c [ j −1 ] for any c ∈ C with c 6= 0 is an eigenvector of λ1 and any vector of the form c [ j 1 ] is an eigenvector of λ2. Definition 2.14 also applies to infinite dimensional spaces, however. Consider, for example, the linear space (C∞([t0, t1],R),R) of infinitely differentiable real valued functions of the interval [t0, t1]. Consider also the function A : C∞([t0, t1],R)→ C∞([t0, t1],R) defined by differentiation, i.e. for all f : [t0, t1]→ R infinitely differentiable define (A(f))(t) = df dt (t), ∀t ∈ [t0, t1]. Exercise 2.21 Show that A is well defined, i.e. if f ∈ C∞([t0, t1],R) then A(f) ∈ C∞([t0, t1],R). Show further that A is linear. One can see that in this case the linear map A has infinitely many eigenvalues. Indeed, any function f : [t0, t1]→ R of the form f(t) = eλt for λ ∈ R is an eigenvector with eigenvalue λ, since (A(f))(t) = d dt eλt = λeλt = λf(t), ∀t ∈ [t0, t1] which is equivalent to A(f) = λ · f . Exercise 2.22 Let A : (V, F )→ (V, F ) be a linear map and consider any λ ∈ F . Show that the set {v ∈ V | A(v) = λv} is a subspace of V . Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 26 The vectors yj ∈ V can all be represented with respect to the basis {vi}mi=1 of (V, F ). In other words for all j = 1, . . . , n there exist unique aij ∈ F such that yj = A(uj) = m∑ i=1 aijvi. The aij ∈ F can then be used to form a matrix A =   a11 . . . a1n ... . . . ... am1 . . . amn   ∈ Fm×n. Since representations are unique (Fact 2.8), the linear map A and the bases {uj}nj=1 and {vi}mi=1 uniquely define the matrix A ∈ Fm×n. Consider now an arbitrary x ∈ U . Again there exists unique representation ξ ∈ Fn of x with respect to the basis {uj}nj=1, x = n∑ j=1 ξjuj Let us now see what happens to this representation if we apply the linear map A to x. Clearly A(x) ∈ V , therefore there exists a unique representation η ∈ Fm of A(x) with respect to the basis {vi}mi=1. It turns out that the two representations of x and A(x) are related by matrix multiplication. Fact 2.10 ξ ∈ Fn of a vector x ∈ U with respect to the basis {uj}nj=1 and η ∈ Fm of A(x) ∈ V with respect to the basis {vi}mi=1. Then η = A · ξ, where · denotes standard matrix multiplication. Proof: By definition A(x) = m∑ i=1 ηivi. Recall that A(x) = A   n∑ j=1 ξjuj   = n∑ j=1 ξjA(uj) = n∑ j=1 ξj m∑ i=1 aijvi = m∑ i=1   n∑ j=1 aijξj   vi. By uniqueness of representation ηi = n∑ j=1 aijξj ⇒ η = A · ξ, where · denotes the standard matrix multiplication. Therefore, when one looks at the representations of vectors with respect to given bases, application of the linear map A to x ∈ U is equivalent to multiplication of its representation (an element of Fn) with the matrix A ∈ Fm×n. To illustrate this fact we will write things like (U, F ) A−→ (V, F ) x 7−→ A(x) {uj}nj=1 A∈Fm×n −→ {vi}mi=1 ξ ∈ Fn 7−→ Aξ ∈ Fm Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 27 Theorem 2.6 The following relations between linear map operations and the corresponding matrix representations hold: 1. Consider linear maps B : (U, F ) → (V, F ) and A : (V, F ) → (W,F ) where U , V and W are finite dimensional linear spaces of dimensions n, m and p respectively. Then composition C = A ◦ B : (U, F )→ (W,F ) is also a linear map. Moreover, if we fix bases {uk}nk=1, {vi}mi=1 and {wj}pj=1 for the three spaces and (U, F ) B−→ (V, F ) {uk}nk=1 B∈Fm×n −→ {vi}mi=1 and (V, F ) A−→ (W,F ) {vi}mi=1 A∈Fp×m −→ {wj}pj=1 then (U, F ) C=A◦B−→ (W,F ) {uk}nk=1 C=A·B∈Fp×n −→ {wj}pj=1 where · denotes the standard matrix multiplication. 2. Consider an invertible linear map A : (V, F ) → (V, F ) on an n-dimensional linear space V and let A−1 : (V, F ) → (V, F ) denote its inverse. If A is the representation of A with respect to a given basis of V , then A−1 is the representation of A−1 with respect to the same basis. The proof is left as an exercise (Problem 2.9). Analogous statements can of course be made about the representations of linear maps obtained by adding, or scaling other linear maps. 2.8 Change of basis Given a linear map, A : (U, F ) → (V, F ), selecting different bases for the linear spaces (U, F ) and (V, F ) leads to different representations. (U, F ) A−→ (V, F ) {uj}nj=1 A∈Fm×n −→ {vi}mi=1 {ũj}nj=1 Ã∈Fm×n −→ {ṽi}mi=1 In this section we investigate the relation between the two representations A and Ã. Recall first that changing basis changes the representations of all vectors in the linear spaces. It is therefore expected that the representation of a linear map will also change. Example (Change of basis) Consider x = (x1, x2) ∈ R2. The representation of x with respect to the canonical basis {u1, u2} = {(1, 0), (0, 1)} is simply ξ = (x1, x2). The representation with respect to the basis {ũ1, ũ2} = {(1, 0), (1, 1)} is ξ̃ = (x1 − x2, x2) since x = x1 [ 1 0 ] + x2 [ 0 1 ] = (x1 − x2) [ 1 0 ] + x2 [ 1 1 ] . To derive the relation between A and Ã, consider first the identity map (U, F ) 1U−→ (U, F ) x 7−→ 1U (x) = x {uj}nj=1 I∈Fn×n −→ {uj}nj=1 {ũj}nj=1 Q∈Fn×n −→ {uj}nj=1 Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 28 I denotes the usual identity matrix in Fn×n I =   1 0 . . . 0 0 1 . . . 0 ... ... . . . ... 0 0 . . . 1   ∈ F n×n where 0 and 1 are the addition and multiplication identity of F . The argument used to derive the representation of a linear map as a matrix in Section 2.7 suggests that the elements of Q ∈ Fn×n are simply the representations of 1U (ũj) = ũj (i.e. the elements of the basis {ũj}nj=1) with respect to the basis {uj}nj=1. Likewise (V, F ) 1V−→ (V, F ) x 7−→ 1V (x) = x {vi}mi=1 I∈Fm×m −→ {vi}mi=1 {vi}mi=1 P∈Fm×m −→ {ṽi}mi=1 Exercise 2.23 Show that the matrices Q ∈ Fn×n and P ∈ Fm×m are invertible. (Recall that 1U and 1V are bijective functions.) Example (Change of basis (cont.)) Consider the identity map R2 1 R2−→ R2 x 7−→ x {(1, 0), (0, 1)} I=   1 0 0 1  ∈R 2×2 −→ {(1, 0), (0, 1)} (x1, x2) 7−→ (x1, x2) On the other hand, R2 1 R2−→ R2 x 7−→ x {(1, 0), (1, 1)} Q=   1 1 0 1  ∈R 2×2 −→ {(1, 0), (0, 1)} (x̃1, x̃2) 7−→ (x1, x2) = (x̃1 + x̃2, x̃2) and R2 1 R2−→ R2 x 7−→ x {(1, 0), (0, 1)} Q−1=   1 −1 0 1  ∈R 2×2 −→ {(1, 0), (1, 1)} (x1, x2) 7−→ (x̃1, x̃2) = (x1 − x2, x2) Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 31 Problem 2.5 (Subspaces) 1. Let U and V be linear spaces and let L(U, V ) denote the set of linear functions A : U → V . Show that L(U, V ) is a linear subspace of F(U, V ), the space of all functions mapping U into V with the usual operations of function addition and scalar multiplication. 2. Let U and V be linear spaces and A : U → V be a linear function. Show that Range(A) is a subspace of V and Null(A) is a subspace of U . 3. Let {Wi}ni=1 be a finite family of subspaces of V . Show that the the intersection and the direct sum of these subspaces n⋂ i=1 Wi = {v ∈ V | ∀i = 1, . . . , n, v ∈ Wi} n⊕ i=1 Wi = {v ∈ V | ∃wi ∈Wi, i = 1, . . . n, v = w1 + . . . wn} are themselves subspaces of V . Problem 2.6 (Basis and vector representation) Let V be a finite dimensional linear space. 1. Let W be a subspace of V . Show that W is also finite dimensional and its dimension can be no greater than that of V . 2. Show that the representation of a given x ∈ V with respect to a basis {b1, . . . , bn} is unique. Problem 2.7 (Rank and nullity) Let (F,+, ·) be a field and consider the linear mapsA : (Fn, F )→ (Fm, F ) and B : (F p, F )→ (Fn, F ) represented by matrices A ∈ Fm×n and B ∈ F p×n respectively. Show that: 1. 0 ≤ Rank(A) ≤ min{n,m} and Rank(A) +Nullity(A) = n. 2. Rank(A) +Rank(B)− n ≤ Rank(BA) ≤ min{Rank(A),Rank(B)}. (Hint: Let A′ : Range(B)→ Fm be the restriction of A to Range(B). Then: (a) Range(A ◦ B) = Range(A′) ⊆ Range(A), (b) Null(A′) ⊆ Null(A). To show part 2 apply the result from part 1 to A′.) Problem 2.8 (Invertible matrices) Let F be a field, A ∈ Fn×n be a matrix, and A : Fn → Fn the linear map defined by A(x) = Ax for all x ∈ Fn. Show that the following statements are equivalent: 1. A is invertible. 2. None of the eigenvalues of A is equal to zero. 3. A is bijective. 4. A in injective. 5. A is surjective. 6. Rank(A) = n. 7. Nullity(A) = 0. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 32 8. The columns aj = (a1j , . . . , anj) ∈ Fn form a linearly independent set {aj}nj=1. 9. The rows a′i = (ai1, . . . , ain) ∈ Fn form a linearly independent set {a′i}ni=1. Problem 2.9 (Matrix representation properties) 1. Consider linear maps A : (V, F ) → (W,F ) and B : (U, F ) → (V, F ). Assume that U, V,W have finite dimensions m,n, p respectively, and that A and B have representations A ∈ F p×n and B ∈ Fn×m with respect to given bases for the three spaces. Show that the composition C = A ◦ B : (U, F )→ (W,F ) has representation C = AB with respect to the same bases. 2. Consider a linear map A : (U, F ) → (U, F ) where U has finite dimension n. Assume that A has representation A ∈ Fn×n with respect to a given basis for U . Show that if A is invertible, then A−1 has representation A−1 with respect to the same basis. Problem 2.10 (Matrix representation examples) 1. Consider a linear map A : (U, F ) → (U, F ) where U has finite dimension n. Assume there exists a vector b ∈ U such that the collection {b,A(b),A◦A(b), . . . ,An−1(b)} forms a basis for U . Derive the representation of A and b with respect to this basis. 2. Consider a linear map A : (U, F )→ (U, F ) where U has finite dimension n. Assume there exists a basis bi, i = 1, . . . , n for U such that A(bn) = λbn and A(bi) = λbi + bi+1, i = 1, . . . , n − 1. Derive the representation of A with respect to this basis. 3. Consider two matrices A, Ã ∈ Rn×n related by a similarity transformation; i.e. there exists Q ∈ Rn×n invertible such that Ã = Q−1AQ. Show that Spec[A] = Spec[Ã]. Problem 2.11 (Matrix eigenvalues) Let F be a field, A ∈ Fn×n be a matrix, and A : Fn → Fn be the linear map defined by A(x) = Ax for all x ∈ Fn. The following statements are equivalent: 1. λ ∈ C is an eigenvalue of A. 2. Det(λI −A) = 0. 3. There exists v ∈ Cn such that v 6= 0 and Av = λv. 4. There exists η ∈ Cn such that η 6= 0 and ηTA = ληT . Problem 2.12 (Linear function spaces) Show that linear functions A : U → V between two linear spaces (U, F ) and (V, F ) form a linear space over the field F under the usual operations of function addition and scalar multiplication. Chapter 3 Introduction to Analysis Consider a linear space (V, F ) and assume the F = R or F = C so that for a ∈ F the absolute value (or modulus) |a| is well defined. 3.1 Norms and continuity Definition 3.1 A norm on a linear space (V, F ) is a function ‖ · ‖ : V → R+ such that: 1. ∀v1, v2 ∈ V , ‖v1 + v2‖ ≤ ‖v1‖+ ‖v2‖ (triangle inequality). 2. ∀v ∈ V , ∀a ∈ F , ‖av‖ = |a| · ‖v‖. 3. ‖v‖ = 0⇔ v = 0. A linear space equipped with such a norm is called a normed linear space and is denoted by (V, F, ‖·‖). v = 0 in the last line refers of course to the zero vector in V . A norm provides a notion of “length” for an element of a linear space. The norm can also be used to define a notion of “distance” between elements of a linear space; one can think of ‖v1−v2‖ as the distance between two vectors v1, v2 ∈ V . Example (Normed spaces) In (Rn,R), the following are examples of norms: ‖x‖1 = n∑ i=1 |xi|, (1-norm) ‖x‖2 = √√√√ n∑ i=1 |xi|2 (Euclidean or 2-norm) ‖x‖p = ( n∑ i=1 |xi|p ) 1 p for p ≥ 1, (p-norm) ‖x‖∞ = max i=1,...,n |xi| (infinity norm) Exercise 3.1 Show that ‖x‖1, ‖x‖2, and ‖x‖∞ satisfy the axioms of a norm. 33 Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 36 Fact 3.2 Let ‖ ·‖a and ‖ ·‖b be two equivalent norms on a linear space (V, F ) with F = R or F = C. A sequence {vi}∞i=0 ⊆ V converges to some v ∈ V in (V, F, ‖ · ‖a) if and only if it converges to v in (V, F, ‖ · ‖b). Proof: Suppose that the sequence {vi}∞i=0 converges to v ∈ V with respect to the norm ‖ · ‖a, that is for all ǫ > 0 there exists N ∈ N such that ‖vm− v‖a < ǫ for all m ≥ N . Due to equivalence of the norms, there exists mu > 0 such that ml‖v‖a ≤ ‖v‖b ≤ mu‖v‖a for all v. Fix an arbitrary ǫ > 0. Then there exists N ∈ N such that, for all m ≥ N , ‖vm − v‖a < ǫ mu ; But then, with the same N , for all m ≥ N ‖vm − v‖b ≤ mu‖vm − v‖a < mu ǫ mu = ǫ. Since function continuity, open and closed sets were all defined in terms of convergence of sequences, this fact further implies that open/closed sets defined using one norm remain open/closed for any other equivalent norm. Likewise, continuous functions defined with respect to a pair of norms remain continuous with respect to any other pair of respectively equivalent norms. The fact that ‖x‖1 and ‖x‖∞ are equivalent norms on Rn is not a coincidence. A remarkable result states, indeed, that any two norms on a finite-dimensional space are equivalent. To show this, we will use the following fact, which is indeed a corollary of two fundamental theorems in real analysis. Fact 3.3 (A corollary to the Weierstrass Theorem). A continuous function f : S → R defined on a subset S ⊆ Rn that is compact in (Rn, ‖ · ‖2) attains a minimum on S. In other words, if the function f : S → R is continuous and the set S is compact there exist xm ∈ S and a real number m such that f(xm) = m ≤ f(x) for all x ∈ S, or m = inf x∈S f(x) = min x∈S f(x) = f(xm) > −∞. The proof of this fact can be found in [16]. Recall that the infimum infx∈S f(x) is the greatest lower bound of f on S, i.e. the largest number m ∈ R such that f(x) ≥ m for all x ∈ S; likewise, the supremum supx∈S f(x) is the least upper bound of f on S, i.e. the smallest number M ∈ R such that f(x) ≤ M for all x ∈ S. Fact 3.3 states that if the function f is continuous and the set S is compact the infimum (and by adding a minus sign also the supremum) is finite and attained for some xm ∈ S; in this case the infimum coincides with the minimum minx∈S f(x) of the function (and the supremum with the maximum maxx∈S f(x)). This is not the case in general of course. For example the function f : R+ → R defined by f(x) = e−x is continuous and defined over the closed but unbounded (hence not compact) set [0,∞). The maximum and supremum of the function coincide 1 = supx∈R+ f(x) = maxx∈R+ f(x) = f(0). The infimum of the function, on the other hand, is 0 = infx∈R+ f(x), but is not attained for any x ∈ [0,∞); hence the minimum is not defined. Likewise, the function f(x) = { −1− x if x ∈ [−1, 0) 1− x if x ∈ [0, 1] is defined over the compact set S = [−1, 1] but us discontinuous at 0. Again supx∈[−1,1] f(x) = maxx∈[−1,1] f(x) = f(0) = 1 but infx∈[−1,1] f(x) = −1 is not attained for any x ∈ [−1, 1] and minx∈[−1,1] f(x) is undefined. Finally, for the function f(x) = −1/x on (0, 1] the infimum is not a finite number, since the function tends to −∞ as x tends to 0; this is precisely the situation we need to exclude in the proof of Proposition 3.1 below. For completeness we also recall the following fact. Fact 3.4 (Cauchy Inequality). For ai ∈ R bi ∈ R, i = 1, . . . , n, ( n∑ i=1 aibi )2 ≤ ( n∑ i=1 a2i )( n∑ i=1 b2i ) Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 37 The proof is left as an exercise. Theorem 3.1 Any two norms on a finite-dimensional space V are equivalent. Proof: For simplicity we assume that F = R; the proof for F = C is similar, e.g. by identifying C with R2. Assume V is finite dimensional of dimension n and let {vi}ni=1 be a basis. For an arbitrary element x ∈ V consider the representation ξ ∈ Rn, i.e. x = ∑n i=1 ξivi and define ‖x‖a = √√√√ n∑ i=1 |ξi|2 One can show that ‖ · ‖a : V → R+ is indeed a norm on V (along the lines of Exercise 3.1). By Exercise 3.4 it suffices to show that an arbitrary norm ‖ · ‖b is equivalent to ‖ · ‖a, i.e. there exist mu > ml > 0 such that for all x ∈ V , ml‖x‖a ≤ ‖x‖b ≤ mu‖x‖a. By the norm axioms for ‖ · ‖b and Fact 3.4 ‖x‖b = ∥∥∥∥∥ n∑ i=1 ξivi ∥∥∥∥∥ b ≤ n∑ i=1 |ξi| · ‖vi‖b ≤ √√√√ n∑ i=1 |ξi|2 √√√√ n∑ i=1 ‖vi‖2b ≤ mu‖x‖a. where we have set mu = √∑n i=1 ‖vi‖2b . Consider now the function f : Rn → R defined by f(α) = ∥∥∥∥∥ n∑ i=1 αivi ∥∥∥∥∥ b for α ∈ Rn. We show that f is continuous as a function from (Rn,R, ‖ · ‖2) to (R,R, | · |). Indeed, given two elements α, α′ ∈ Rn |f(α)− f(α′)| = ∣∣∣∣∣ ∥∥∥∥∥ n∑ i=1 αivi ∥∥∥∥∥ b − ∥∥∥∥∥ n∑ i=1 α′ ivi ∥∥∥∥∥ b ∣∣∣∣∣ ≤ ∥∥∥∥∥ n∑ i=1 (αi − α′ i)vi ∥∥∥∥∥ b see proof of Fact 3.1 ≤ n∑ i=1 |αi − α′ i| ‖vi‖b by the properties of the norm ‖ · ‖b ≤ √√√√ n∑ i=1 |αi − α′ i|2 √√√√ n∑ i=1 ‖vi‖2b by Fact 3.4 = mu‖α− α′‖. Therefore, for any α ∈ Rn and any ǫ > 0 if we select δ = ǫ/mu then for all α′ ∈ Rn such that ‖α− α′‖2 < δ it is true that |f(α)− f(α′)| < ǫ; hence f is continuous. Finally, consider the set S = {α ∈ Rn | ∑n i=1 α 2 i = 1} ⊆ Rn. Clearly ‖α‖2 ≤ 1 for all α ∈ S, hence S is bounded in (Rn,R, ‖ · ‖2). Moreover, S is also closed in (Rn,R, ‖ · ‖2) as it is the inverse image of the closed set {1} under the continuous (by Fact 3.1) function α 7→ √∑n i=1 α 2 mapping (Rn,R, ‖ · ‖2) into (R,R, | · |) (see Problem 3.4). Hence, S is compact in (Rn,R, ‖ · ‖2) and the continuous function f : S → R attains a minimum m for some α∗ ∈ S. Then for any x ∈ V with a Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 38 representation ξ ∈ Rn with respect to the basis vi, ‖x‖b = ∥∥∥∥∥ n∑ i=1 ξivi ∥∥∥∥∥ b = √√√√ n∑ i=1 |ξi|2 ∥∥∥∥∥ n∑ i=1 ξi√∑n i=1 |ξi|2 vi ∥∥∥∥∥ = √√√√ n∑ i=1 |ξi|2 ∥∥∥∥∥ n∑ i=1 αivi ∥∥∥∥∥ where we set αi = ξi√∑n i=1 |ξi|2 = ‖x‖af(α) ≥ m‖x‖a The last inequality follows since by construction ∑n i=1 α 2 i = 1, hence α ∈ S and f is lower bounded by m. Setting ml = m completes the proof. 3.3 Infinite-dimensional normed spaces Consider now real numbers t0 ≤ t1 and the linear space C([t0, t1],R n) of continuous functions f : [t0, t1]→ Rn. For each t ∈ [t0, t1], f(t) ∈ Rn; consider its standard 2-norm ‖f(t)‖2 as a vector in Rn and define the function ‖ · ‖∞ : C([t0, t1],R n)→ R+ by ‖f‖∞ = max t∈[t0,t1] ‖f(t)‖2. Note that, by Fact 3.3, the maximum is indeed attained, that is there exists some t∗ ∈ [t0, t1] such that ‖f‖∞ = ‖f(t∗)‖2. More generally, e.g. if the functions can be discontinuous or the domain is not compact, one can use the supremum instead of the maximum in the definition of ‖f‖∞. One can show that the function ‖f‖∞ defined in this way is a norm on (C([t0, t1],R n),R). Indeed, for all continuous functions f , ‖f‖∞ is greater than or equal to zero and it is trivially equal to zero if and only if f is the zero function (i.e. f(t) = 0 ∈′ Ren for all t ∈ [t0, t1]). Moreover, ‖αf‖∞ = max t∈[t0,t1] ‖αf(t)‖2 = |α| max t∈[t0,t1] ‖f(t)‖2 = |α| ‖f‖∞, ‖f + g‖∞ = max t∈[t0,t1] ‖f(t) + g(t)‖2 ≤ max t∈[t0,t1] ‖f(t)‖2 + max t∈[t0,t1] ‖g(t)‖2 = ‖f‖∞ + ‖g‖∞. Norms on function spaces can also be defined using integration. For example, for f ∈ C([t0, t1],Rn) we can define the following: ‖f‖1 = ∫ t1 t0 ‖f(t)‖2 dt ‖f‖2 = √∫ t1 t0 ‖f(t)‖22 dt ‖f‖p = (∫ t1 t0 ‖f(t)‖p2 dt ) 1 p Here, the integral takes the role of the finite summation in the definition of the corresponding norms on finite-dimensional spaces. Since the 2-norm that appears in the integrands is a continuous function (Fact3.1), the integrands are also continuous functions of the variable t, and all these quantities are well-defined. Moreover, all of them are norms. Take for example the first one. Of course we have ‖f‖1 ≥ 0 for all f ∈ C([t0, t1],Rn); the norm of the zero-function is of course equal to zero; on the other hand, given a continuous function f , if f is nonzero at a point it must be nonzero in a whole Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 41 t 1 1−1 1 m 1 N fN(t) fm(t) ‖fm − fN‖∞ ‖fm − fN‖1 Figure 3.1: Examples of functions in non-Banach space. Proof: According to Definition 3.5, W is closed if for all sequences {wi}∞i=1 ⊆ W , if wi → w ∈ V , then w ∈ W . Now if an arbitrary sequence {wi}∞i=1 ⊆W is convergent in V , then it is Cauchy both in V and in W . Since W is Banach, its limit point w belongs to W . This fact will come in handy in Chapter 7 in the context of inner product spaces. The state, input and output spaces of our linear systems will be of the form Rn, and will all be Banach spaces. One may be tempted to think that this is more generally true. Unfortunately infinite-dimensional spaces are less well-behaved. Example (Non-Banach space) Consider the normed space (C([−1, 1],R),R, ‖ · ‖1) of continuous functions f : [−1, 1]→ R with the 1−norm ‖f‖1 = ∫ 1 −1 |f(t)|dt. For i = 1, 2, . . . consider the sequence of functions (Figure 3.1) fi(t) =    0 if t < 0 it if 0 ≤ t < 1/i 1 if t ≥ 1/i. (3.2) It is easy to see that the sequence is Cauchy. Indeed, if we take N,m ∈ {1, 2, . . .} with m ≥ N , ‖fm − fN‖1 = ∫ 1 −1 |fm(t)− fN (t)|dt = ∫ 1/m 0 (mt−Nt)dt+ ∫ 1/N 1/m (1−Nt)dt = (m−N) [ t2 2 ]1/m 0 + [ 1−N t2 2 ]1/N 1/m = m−N 2mN ≤ 1 2N . Therefore, given any ǫ > 0 by selecting N > 1/(2ǫ) we can ensure that for allm ≥ N , ‖fm−fN‖1 < ǫ. One can guess that the sequence {fi}∞i=1 converges to the function f(t) = { 0 if t < 0 1 if t ≥ 0. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 42 Indeed ‖fi − f‖1 = ∫ 1/i 0 (1− it)dt = 1 2i → 0. The problem is that the limit function f ∈ F([−1, 1],R) is discontinuous and therefore f 6∈ C([−1, 1],R). Hence C([−1, 1],R) is not a Banach space since the Cauchy sequence {fi}ni=1 does not converge to an element of C([−1, 1],R). Fortunately there are several infinite dimensional linear spaces that are Banach. The important one for this chapter is the space of continuous functions under the infinity norm. Theorem 3.3 (C([t0, t1],R n),R, ‖ · ‖∞) is a Banach space. Proof: We articulate the proof in three steps: First, given a sequence of continuous functions which is Cauchy in the norm ‖ · ‖∞, we define a “pointwise limit” function; second, we prove that the sequence converges to this function; third, we prove that this function is continuous. Let {fn}∞n=1 be a Cauchy sequence of continuous functions f : [t0, t1] → Rn. For each t ∈ [t0, t1], {fn(t)}∞n=1 is a Cauchy sequence of vectors in Rn (why?). Since Rn is a Banach space (Theorem 3.2), every such sequence has a limit. We define the function f as follows: f(t) = lim n→∞ fn(t) Next, we show that {fn}∞n=1 converges to f also with respect to the norm ‖ · ‖∞. Indeed, the fact that {fn}∞n=1 is Cauchy means that, for every ǫ > 0, there exists N ∈ N such that ∀n,m ≥ N, ∀t ∈ [t0, t1], ‖fn(t)− fm(t)‖ ≤ ‖fn − fm‖∞ < ǫ In this equation, fix t and n, and let m→∞: ∀n ≥ N, ∀t ∈ [t0, t1], ‖fn(t)− f(t)‖ ≤ ǫ Taking the supremum over t, ∀n ≥ N, ‖fn − f‖∞ ≤ ǫ hence ‖fn − f‖∞ → 0. It remains to show that f is continuous. Let t̄ ∈ [t0, t1]. It holds: ‖f(t)− f(t̄)‖ = ‖f(t)− fn(t) + fn(t)− fn(t̄) + fn(t̄)− f(t̄)‖ ≤ ‖f(t)− fn(t)‖+ ‖fn(t)− fn(t̄)‖+ ‖fn(t̄)− f(t̄)‖ Now fix an ǫ > 0. Since {fn}∞n=1 converges to f with respect to ‖ · ‖∞, there exists n such that ‖f − fn‖∞ < ǫ/3, and therefore ‖f(t)− fn(t)‖ < ǫ/3 for all t and in particular ‖fn(t̄)− f(t̄)‖ < ǫ/3. On the other hand, since each function of the sequence is continuous, and in particular so is fn, there exists δ > 0 such that ‖fn(t) − fn(t̄)‖ < ǫ/3 whenever |t − t̄| < δ. Thus, for all ǫ > 0 there exists (n and) δ > 0 such that, if |t− t̄| < δ, then ‖f(t)− f(t̄)‖ ≤ ǫ/3 + ǫ/3 + ǫ/3 = ǫ, and f is continuous. Summarizing, given an arbitrary Cauchy sequence {fn}∞n=1 in C([t0, t1],R,R n), we can construct a function f ∈ C([t0, t1],R,Rn) such that ‖fn − f‖∞ → 0. Exercise 3.7 Show that the sequence {fi}∞i=1 defined in (3.2) is not Cauchy in (C([−1, 1],R),R, ‖ · ‖∞). The fact that (C([t0, t1],R n),R, ‖ · ‖∞) is a Banach space will be exploited below for the proof of existence of solutions for ordinary differential equations. In Chapter 7 we will encounter another infinite dimensional linear space, the so-called space of square integrable functions, which will play a central role in the discussion of controllability and observability. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 43 3.5 Induced norms and matrix norms Consider now the space (Rm×n,R). Exercise 3.8 Show that (Rm×n,R) is a linear space with the usual operations of matrix addition and scalar multiplication. (Hint: One way to do this is to identify Rm×n with Rnm.) The following are examples of norms on (Rm×n,R): m∑ i=1 n∑ j=1 |aij | (cf. 1 norm in Rnm) ‖A‖F =   m∑ i=1 n∑ j=1 a2ij   1 2 (Frobenius norm, cf. 2 norm in Rnm) max i=1,...,m max j=1,...,n |aij | (cf. ∞ norm in Rnm) More commonly used are the norms derived when one considers matrices as linear maps between linear spaces. We start with a more general definition. Definition 3.10 Consider the linear space of continuous functions f : (U, F, ‖ · ‖U )→ (V, F, ‖ · ‖V ) between two normed spaces. The induced norm of f is defined as ‖f‖ = sup u6=0 ‖f(u)‖V ‖u‖U . One can check that, whenever the supremum is finite, it indeed defines a norm on the space of continuous functions (Problem 3.6). Notice that the induced norm depends not only on the function, but also on the norms imposed on the two spaces. For continuous linear functions between normed space, the definition of the induced norm simplifies somewhat. It turns out that it is not necessary to take the supremum over all non-zero vectors in U ; it suffices to consider vectors with norm equal to 1. Fact 3.6 Consider two normed spaces (U, F, ‖·‖U ) and (V, F, ‖·‖V ) and a continuous linear function A : U → V . Then ‖A‖ = sup ‖u‖U=1 ‖A(u)‖V . The proof is left as an exercise (Problem 3.6). Example (Induced matrix norms) Consider A ∈ Fm×n and consider the linear map A : Fn → Fm defined by A(x) = A · x for all x ∈ Fn. By considering different norms on Fm and Fn different induced norms for the linear map (and hence the matrix) can be defined: ‖A‖p = sup x∈Fn ‖Ax‖p ‖x‖p ‖A‖1 = max j=1,...,n m∑ i=1 |aij | (maximum column sum) ‖A‖2 = max λ∈Spec[ATA] √ λ (maximum singular value) ‖A‖∞ = max i=1,...,m n∑ j=1 |aij | (maximum row sum) Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 46 Proof: With the exception of the last statement, all others are immediate consequences of the definition of the induced norm and the fact that it is indeed a norm (Problem 3.6). To show the last statement, consider ‖A ◦ B‖ = sup ‖u‖U=1 ‖(A ◦ B)(u)‖W = sup ‖u‖U=1 ‖A(B(u))‖W ≤ sup ‖u‖U=1 ‖A‖ · ‖B(u)‖V (by the first statement) =‖A‖ · sup ‖u‖U=1 ‖B(u)‖V = ‖A‖ · ‖B‖. 3.6 Ordinary differential equations The main topic of these notes are dynamical systems of the form ẋ(t) = A(t)x(t) +B(t)u(t) (3.3) y(t) = C(t)x(t) +D(t)u(t) (3.4) where t ∈ R, x(t) ∈ Rn, u(t) ∈ Rm, y(t) ∈ Rp A(t) ∈ Rn×n, B(t) ∈ Rn×m, C(t) ∈ Rp×n, D(t) ∈ Rp×m The difficult part conceptually is equation (3.3), a linear ordinary differential equation (ODE) with time varying coefficients A(t) and B(t). Equation (3.3) is a special case of the (generally non-linear) ODE ẋ(t) = f(x(t), u(t), t) (3.5) with t ∈ R, x(t) ∈ Rn, u(t) ∈ Rm f : Rn × Rm × R −→ Rn. The only difference is that for linear systems the function f(x, u, t) is a linear function of x and u for all t ∈ R. In this section we are interested in finding “solutions” (also known as “trajectories”, “flows”, . . . ) of the ODE (3.5). In other words: • Given: f : Rn × Rm × R→ Rn dynamics (t0, x0) ∈ R× Rn “initial” condition u(·) : R→ Rm input trajectory • Find: x(·) : R→ Rn state trajectory • Such that: x(t0) = x0 d dt x(t) = f(x(t), u(t), t) ∀t ∈ R Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 47 While this definition is acceptable mathematically it tends to be too restrictive in practice. The problem is that according to this definition x(t) should not only be continuous as a function of time, but also differentiable; since the definition makes use of the derivative dx(t)/dt, it implicitly assumes that the derivative is well defined. This will in general disallow input trajectories u(t) which are discontinuous as a function of time. We could in principle only allow continuous inputs (in which case the above definition of a solution should be sufficient) but unfortunately many interesting input functions turn out to be discontinuous. Example (Hasty driver) Consider a car moving on a road. Let y ∈ R denote the position of the car with respect to a fixed point on the road, v ∈ R denote the velocity of the car and a ∈ R denote its acceleration. We can then write a “state space” model for the car by defining x = [ y v ] ∈ R2, u = a ∈ R and observing that ẋ(t) = [ ẏ(t) v̇(t) ] = [ x2(t) u(t) ] . Defining f(x, u) = [ x2 u ] the dynamics of our car can then be described by the (linear, time invariant) ODE ẋ(t) = f(x(t), u(t)). Assume now that we would like to select the input u(t) to take the car from the initial state x(0) = [ 0 0 ] to the terminal state x(T ) = [ yF 0 ] in the shortest time possible (T ) while respecting constraints on the speed and acceleration v(t) ∈ [0, Vmax] and a(t) ∈ [amin, amax] ∀t ∈ [0, T ] for some Vmax > 0, amin < 0 < amax. It turns out that the optimal solution for u(t) is discontinuous. Assuming that yF is large enough it involves three phases: 1. Accelerate as much as possible (u(t) = amax) until the maximum speed is reached (x2(t) = Vmax). 2. “Coast” (u(t) = 0) until just before reaching yF . 3. Decelerate as much as possible (u(t) = amin) and stop exactly at yF . Unfortunately this optimal solution is not allowed by our definition of solution given above. To make the definition of solution more relevant in practice we would like to allow discontinuous u(t), albeit ones that are not too “wild”. Measurable functions provide the best notion of how “wild” input functions are allowed to be and still give rise to reasonable solutions for differential equations. Unfortunately the proper definition of measurable functions requires some exposure to measure theory and is beyond the scope of these notes; the interested reader is referred to [18] for a treatment of dynamical systems from this perspective. For our purposes the following, somewhat simpler definition will suffice. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 48 Definition 3.11 A function u : R → Rm is piecewise continuous if and only if it is continuous at all t ∈ R except those in a set of discontinuity points D ⊆ R that satisfy: 1. ∀τ ∈ D left and right limits of u exist, i.e. limt→τ+ u(t) and limt→τ− u(t) exist and are finite. Moreover, u(τ) = limt→τ+ u(t). 2. ∀t0, t1 ∈ R with t0 < t1, D ∩ [t0, t1] contains a finite number of points. The symbols limt→τ+ u(t) and limt→τ− u(t) indicate the limit of u(t) as t approaches τ from above (t ≥ τ) and from below (t ≤ τ). We will use the symbol PC([t0, t1],R m) to denote the linear space of piecewise continuous functions f : [t0, t1]→ Rm (and similarly for PC(R,Rm)). Exercise 3.9 Show that (PC([t0, t1],R m),R) is a linear space under the usual operations of function addition and scalar multiplication. Definition 3.11 includes all continuous functions, the solution to the hasty driver example, square waves, etc. Functions that are not included are things like 1/t or tan(t) (that go to infinity for some t ∈ R), and obscure constructs like u(t) =    0 (t ≥ 1/2) ∨ (t ≤ 0) −1 t ∈ [ 1 2k+1 , 1 2k ) 1 t ∈ [ 1 2(k+1) , 1 2k+1 ) k = 1, 2, . . . that have an infinite number of discontinuity points in the finite interval [0, 1/2]. Let us now return to our differential equation (3.5). Let us generalize the picture somewhat by considering ẋ(t) = p(x(t), t) (3.6) where we have obscured the presence of u(t) by defining p(x(t), t) = f(x(t), u(t), t). We will impose the following assumption of p. Assumption 3.1 The function p : Rn × R → Rn is piecewise continuous in its second argument, i.e. there exists a set of discontinuity points D ⊆ R such that for all x ∈ Rn 1. p(x, ·) : R→ Rn is continuous for all t ∈ R \D. 2. For all τ ∈ D, limt→τ+ p(x, t) and limt→τ− p(x, t) exist and are finite and p(x, τ) = limt→τ+ p(x, t). 3. ∀t0, t1 ∈ R with t0 < t1, D ∩ [t0, t1] contains a finite number of points. Exercise 3.10 Show that if u : R → Rm is piecewise continuous (according to Definition 3.11) and f : Rn × Rm × R → Rn is continuous then p(x, t) = f(x, u(t), t) satisfies the conditions of Assumption 3.1. We are now in a position to provide a formal definition of the solution of ODE (3.6). Definition 3.12 Consider a function p : Rn×R→ Rn satisfying the conditions of Assumption 3.1. A continuous function φ : R→ Rn is called a solution of (3.6) passing through (t0, x0) ∈ R× Rn if and only if 1. φ(t0) = x0, and 2. ∀t ∈ R \D, φ is differentiable at t, and d dtφ(t) = p(φ(t), t). Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 51 defined by non-Lipschitz functions that do posess unique solutions for all intial conditions; to estab- lish this fact, however, more work is needed on a case by case basis. The existence and uniqueness results discussed here can be further fine tuned (assuming for example local Lipschitz continuity to derive local existence of solutions). In the Chapter 4, however, we will show that linear differential equations, the main topic of these notes, always satisfy the global Lipschitz assumption. We will therefore not pursue such refinements here, instead we refer the interested reader to [12, 17]. 3.7 Existence and uniqueness of solutions We are now in a position to state and prove a fundamental fact about the solutions of ordinary differential equations. Theorem 3.6 (Existence and uniqueness) Assume p : Rn × R → Rn is piecewise continuous with respect to its second argument (with discontinuity set D ⊆ R) and globally Lipschitz with respect to its first argument. Then for all (t0, x0) ∈ R×Rn there exists a unique continuous function φ : R→ Rn such that: 1. φ(t0) = x0. 2. ∀t ∈ R \D, d dtφ(t) = p(φ(t), t). The proof of this theorem is rather sophisticated. We will build it up in three steps: 1. Background lemmas. 2. Proof of existence (construction of a solution). 3. Proof of uniqueness. 3.7.1 Background lemmas For a function f : [t0, t1]→ Rn with f(t) = (f1(t), . . . , fn(t)) define ∫ t1 t0 f(t)dt =   ∫ t1 t0 f1(t)dt ...∫ t1 t0 fn(t)dt   . Fact 3.7 Let ‖ · ‖ be any norm on Rn. Then for all t0, t1 ∈ R, ∥∥∥∥ ∫ t1 t0 f(t)dt ∥∥∥∥ ≤ ∣∣∣∣ ∫ t1 t0 ‖f(t)‖ dt ∣∣∣∣ . Proof:(Sketch) Roughly speaking one can approximate the integral by a sum, use triangle inequality on the sum and take a limit. Note that the absolute value on the right had side is necessary, since the integral there will be negative if t1 < t0. Recall that m! = 1 · 2 · . . . ·m denotes the factorial of a natural number m ∈ N. Fact 3.8 The following hold: 1. ∀m, k ∈ N, (m+ k)! ≥ m! · k!. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 52 2. ∀c ∈ R, limm→∞ cm m! = 0. Proof: Part 1 is easy to prove by induction (Theorem 1.2). For part 2, the case |c| < 1 is trivial, since cm → 0 while m!→∞. For c > 1, take M > 2c integer. Then for m > M cm m! = cMcm−M M !(M + 1)(M + 2) . . .m ≤ cM M ! · cm−M (2c)m−M = cM M ! · 1 2m−M → 0. The proof for c < −1 is similar. Theorem 3.7 (Fundamental theorem of calculus) Let g : R → R piecewise continuous with discontinuity set D ⊆ R. Then for all t0 ∈ R the function f(t) = ∫ t t0 g(τ)dτ is continuous and for all t ∈ R \D, d dt f(t) = g(t). Theorem 3.8 (Gronwall Lemma) Consider u(·), k(·) : R → R+ piecewise continuous, c1 ≥ 0, and t0 ∈ R. If for all t ∈ R u(t) ≤ c1 + ∣∣∣∣ ∫ t t0 k(τ)u(τ)dτ ∣∣∣∣ then for all t ∈ R u(t) ≤ c1 exp ∣∣∣∣ ∫ t t0 k(τ)dτ ∣∣∣∣ . Proof: Consider t > t0 (the proof for t < t0 is symmetric and gives rise to the absolute values in the theorem statement). Let U(t) = c1 + ∫ t t0 k(τ)u(τ)dτ. Notice that u(t) ≤ U(t), since for t ≥ t0 the absolute value is redundant as u(·) and k(·) are non-negative. By the fundamental theorem of calculus U is continuous and wherever k and u are continuous d dt U(t) = k(t)u(t). Then u(t) ≤ U(t)⇒ u(t)k(t)e − ∫ t t0 k(τ)dτ ≤ U(t)k(t)e − ∫ t t0 k(τ)dτ ⇒ ( d dt U(t) ) e − ∫ t t0 k(τ)dτ − U(t)k(t)e − ∫ t t0 k(τ)dτ ≤ 0 ⇒ ( d dt U(t) ) e − ∫ t t0 k(τ)dτ + U(t) d dt ( e − ∫ t t0 k(τ)dτ ) ≤ 0 ⇒ d dt ( U(t)e − ∫ t t0 k(τ)dτ ) ≤ 0 ⇒ U(t)e − ∫ t t0 k(τ)dτ decreases as t increases ⇒ U(t)e − ∫ t t0 k(τ)dτ ≤ U(t0)e − ∫ t0 t0 k(τ)dτ ∀t ≥ t0 ⇒ U(t)e − ∫ t t0 k(τ)dτ ≤ U(t0) = c1 ∀t ≥ t0 ⇒ u(t) ≤ U(t) ≤ c1e ∫ t t0 k(τ)dτ which concludes the proof. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 53 3.7.2 Proof of existence We will now construct a solution for the differential equation ẋ(t) = p(x(t), t) passing through (t0, x0) ∈ R × Rn using an iterative procedure. This second step of our existence- uniqueness proof will itself involve three steps: 1. Construct a sequence of functions xm(·) : R→ Rn for m = 1, 2, . . .. 2. Show that for all t1 ≤ t0 ≤ t2 this sequence is a Cauchy sequence in the Banach space C([t1, t2],R n), ‖ · ‖∞). Therefore the sequence converges to a limit φ(·) ∈ C([t1, t2],Rn). 3. Show that the limit φ(·) is a solution to the differential equation. Step 2.1: We construct a sequence of functions xm(·) : R → Rn for m = 1, 2, . . . by the so called Picard iteration: 1. x0(t) = x0 ∀t ∈ R 2. xm+1(t) = x0 + ∫ t t0 p(xm(τ), τ)dτ, ∀t ∈ R. The generated sequence of functions is known at the Picard Iteration. Notice that all the functions xm(·) generated in this way are continuous by construction. Consider any t1, t2 ∈ R such that t1 ≤ t0 ≤ t2. Let k = sup t∈[t1,t2] k(t) and T = t2 − t1. Notice that under the conditions of the theorem both k and T are non-negative and finite. Let ‖ · ‖ be the infinity norm on Rn. Then for all t ∈ [t1, t2] ‖xm+1(t)− xm(t)‖ = ∥∥∥∥x0 + ∫ t t0 p(xm(τ), τ)dτ − x0 − ∫ t t0 p(xm−1(τ), τ)dτ ∥∥∥∥ = ∥∥∥∥ ∫ t t0 [p(xm(τ), τ) − p(xm−1(τ), τ)]dτ ∥∥∥∥ ≤ ∣∣∣∣ ∫ t t0 ‖p(xm(τ), τ) − p(xm−1(τ), τ)‖ dτ ∣∣∣∣ (Fact 3.7) ≤ ∣∣∣∣ ∫ t t0 k(τ) ‖xm(τ) − xm−1(τ)‖ dτ ∣∣∣∣ (p is Lipschitz in x) ≤ k ∣∣∣∣ ∫ t t0 ‖xm(τ) − xm−1(τ)‖ dτ ∣∣∣∣ For m = 0 ‖x1(t)− x0(t)‖ ≤ ∣∣∣∣ ∫ t t0 ‖p(x0, τ)‖ dτ ∣∣∣∣ ≤ ∣∣∣∣ ∫ t2 t1 ‖p(x0, τ)‖ dτ ∣∣∣∣ =M, for some non-negative, finite number M (which of course depends on t1 and t2). For m = 1 ‖x2(t)− x1(t)‖ ≤ k ∣∣∣∣ ∫ t t0 ‖x1(τ)− x0(τ)‖ dτ ∣∣∣∣ ≤ k ∣∣∣∣ ∫ t t0 Mdτ ∣∣∣∣ = kM |t− t0|. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 56 3.7.3 Proof of uniqueness Can there be more solutions besides the φ that we constructed in Section 3.7.2? It turns out that this is not the case. Assume, for the sake of contradiction, that there are two different solutions φ(·), ψ(·) : R→ Rn. In other words 1. φ(t0) = ψ(t0) = x0; 2. ∀τ ∈ R \D, d dτ φ(τ) = p(φ(τ), τ) and d dτ ψ(τ) = p(ψ(τ), τ); and 3. there exits t̂ ∈ R such that ψ(t̂) 6= φ(t̂). By construction t̂ 6= t0 and φ(t0)− ψ(t0) = x0 − x0 = 0. Moreover, d dτ (φ(τ) − ψ(τ)) = p(φ(τ), τ) − p(ψ(τ), τ), ∀τ ∈ R \D ⇒φ(t) − ψ(t) = ∫ t t0 [p(φ(τ), τ) − p(ψ(τ), τ)]dτ, ∀t ∈ R ⇒‖φ(t)− ψ(t)‖ ≤ ∣∣∣∣ ∫ t t0 ‖p(φ(τ), τ) − p(ψ(τ), τ)‖dτ ∣∣∣∣ ≤ ∣∣∣∣ ∫ t t0 k(τ)‖φ(τ) − ψ(τ)‖dτ ∣∣∣∣ ⇒‖φ(t)− ψ(t)‖ ≤ c1 + ∣∣∣∣ ∫ t t0 k(τ)‖φ(τ) − ψ(τ)‖dτ ∣∣∣∣ ∀c1 ≥ 0. Letting u(t) = ‖φ(t)− ψ(t)‖ and applying the Gronwall lemma leads to 0 ≤ ‖φ(t)− ψ(t)‖ ≤ c1e ∫ t t0 k(τ)dτ , ∀c1 ≥ 0. Letting c1 → 0 leads to ‖φ(t)− ψ(t)‖ = 0⇒ φ(t) = ψ(t) ∀t ∈ R which contradicts the assumption that ψ(t̂) 6= φ(t̂). This concludes the proof of existence and uniqueness of solutions of ordinary differential equations. From now on we can talk about “the solution” of the differential equation, as long as the conditions of the theorem are satisfied. It turns out that the solution has several other nice properties, for example it varies continuously as a function of the initial condition and parameters of the function p. Unfortunately, the solution cannot usually be computed explicitly as a function of time. In this case we have to rely on simulation algorithms to approximate the solution using a computer. The nice properties of the solution (its guarantees of existence, uniqueness and continuity) come very handy in this case, since they allow one to design algorithms to numerically approximate the solutions and rigorously evaluate their convergence properties. Unfortunately for more general classes of systems, such as hybrid systems, one cannot rely on such properties and the task of simulation becomes much more challenging. One exception to the general rule that solutions cannot be explicitly computed is the case of linear systems, where the extra structure afforded by the linearity of the function p allows us to study the solution in greater detail. We pursue this direction in Chapter 4. Problems for chapter 3 Problem 3.1 (Norms) Let F be either R or C. Show that the following are well-defined norms for the linear spaces (Fn, F ), (Fm×n, F ) and ( C([t0, t1], Fn), F ) , respectively: Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 57 1. ||x||∞ = maxi |xi|, where x = (x1, . . . , xn) ∈ Fn; 2. ||A||∞ = maxi ∑ j |ai,j |, where A = [ai,j ] ∈ Fm×n; 3. ||f ||∞ = maxt∈[t0,t1] ||f(t)||∞, where f ∈ C([t0, t1], Fn). For x ∈ Fn, show in addition that the norms ||x||∞, ||x||1 and ||x||2 are equivalent (Hint: you may assume Schwarz’s inequality: ∑n i=1(|xi| · |yi|) ≤ ||x||2 · ||y||2, ∀x, y ∈ Fn). Problem 3.2 (Ball of a given norm) Consider a normed vector space (V, F, || · ||) and v ∈ V and r ∈ R+ define the open and closed balls centered at v with radius r as the sets B(v, r) = {v′ ∈ V | ‖v − v′‖ < r} and B(v, r) = {v′ ∈ V | ‖v − v′‖ < r} respectively. Show that: 1. B(v, r) is open and B(a, r) is closed. 2. v1, v2 ∈ B(v, r)⇒ λv1 + (1− λ)v2 ∈ B(v, r), ∀λ ∈ [0, 1] (B(v, r) is convex); 3. v ∈ B(0, r)⇒ −v ∈ B(0, r) (B(0, r) is balanced); 4. ∀v′ ∈ V ∃r ∈ (0,+∞) such that v′ ∈ B(0, r). Problem 3.3 Let (V, F, ‖ · ‖) be a normed space. Show that: 1. The sets V and ∅ are both open and closed. 2. If K1,K2 ⊆ V are open sets then K1 ∩K2 is an open set. 3. If K1,K2 ⊆ V are closed sets then K1 ∩K2 is a closed set. 4. Let {Ki ⊆ V | i ∈ I} be a collection of open sets, where I is an arbitrary (finite, or infinite) index set. Then ∪i∈IKi is an open set. 5. Let {Ki ⊆ V | i ∈ I} be a collection of closed sets, where I is an arbitrary (finite, or infinite) index set. Then ∩i∈IKi is a closed set. (Hint: Show 1, 3 and 5, then show that 3 implies 2 and 5 implies 4.) Problem 3.4 (Continuity) Let f : (U, F, ‖·‖U )→ (V, F, ‖·‖V ) be a function between two normed spaces. Show that the following statements are equivalent: 1. f is continuous. 2. For all sequences {ui}∞i=1 ⊆ U , limi→∞ ui = u⇒ limi→∞ f(ui) = f(u). 3. For all K ⊆ V open, the set f−1(K) = {u ∈ U | f(u) ∈ K} is open. 4. For all K ⊆ V closed, the set f−1(K) is closed. Problem 3.5 (Equivalent Norms) Let (V, F ) be a linear space. Let || · ||a and || · ||b be equivalent norms on (V, F ). Let v ∈ V , X ⊆ V and let {vi}i∈N be a sequence of elements of V . Show that: 1. {vi}i∈N is Cauchy w.r.t. || · ||a ⇔ {vi}i∈N is Cauchy w.r.t. || · ||b; 2. vi i→∞−→ v w.r.t. || · ||a ⇔ vi i→∞−→ v w.r.t. || · ||b; 3. X is dense in V w.r.t. || · ||a ⇔ X is dense in V w.r.t. || · ||b. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 58 Problem 3.6 (Induced norms) Let (U, F, || · ||U ) and (V, F, || · ||V ) be normed spaces, and let θU be the zero vector of U . 1. Show that the induced norm ||F|| = sup u∈U : u6=θU ||F(u)||V ||u||U is a well-defined norm for the space of continuous operators F : U → V (you may assume that the space of operators is a linear space over F ). 2. Let A : U → V be a linear operator. For the induced norm ||A||, show that ||A|| = sup u: ||u||U=1 ||A(u)||V . Problem 3.7 (ODE solution properties) Consider p : Rn×R→ Rn Lipschitz continuous in its first argument and piecewise continuous in its second. For t, t0 ∈ R, x0 ∈ Rn, let s(t, t0, x0) denote the unique solution of the differential equation ∂ ∂t s(t, to, x0) = p(s(t, t0, x0), t), with s(t0, t0, x0) = x0. Consider arbitrary t, t1, t0 ∈ R. Show that: 1. For all x0 ∈ Rn, s(t, t1, s(t1, t0, x0)) = s(t, to, x0). 2. For all x0 ∈ Rn, s(t0, t, s(t, t0, x0)) = x0. 3. The function s(t, t0, ·) : Rn → Rn is continuous. Problem 3.8 (Population dynamics) Problem 3.9 (Discontinuous dynamics) Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 61 Then ẋ(t) = f(x(t), u(t)) = f(x∗(t) + δx(t), u∗(t) + δu(t)) = f(x∗(t), u∗(t)) + ∂f ∂x (x∗(t), u∗(t))δx(t) + ∂f ∂u (x∗(t), u∗(t))δu(t) + higher order terms If δx(t) and δu(t) are small then the higher order terms are small compared to the terms linear in δx(t) and δu(t) and the evolution of δx(t) is approximately described by the linear time varying system d dt (δx(t)) = A(t)δx(t) +B(t)δu(t). We can now use the theory developed in subsequent chapters to ensure that δx(t) remains small and hence the nonlinear system tracks the optimal trajectory x∗(t) closely. 4.2 Existence and structure of solutions More formally, let (X,R), (U,R), and (Y,R) be finite dimensional linear spaces, of dimensions n, m, and p respectively1. Consider families of linear functions, A(t) : X → X B(t) : U → X C(t) : X → Y D(t) : U → Y parametrized by a real number t ∈ R. Fix bases, {ei}ni=1 for (X,R), {fi}mi=1 for (U,R), and {gi}pi=1 for (Y,R). Let A(t), B(t), C(t), and D(t) denote respectively the representation of the linear maps A(t), B(t), C(t), and D(t) with respect to those bases, (X,R) A(t)−→ (X,R) {ei}ni=1 A(t)∈R n×n −→ {ei}ni=1 (U,R) B(t)−→ (X,R) {fi}mi=1 B(t)∈R n×m −→ {ei}ni=1 (X,R) C(t)−→ (Y,R) {ei}ni=1 C(t)∈R p×n −→ {gi}pi=1 (U,R) D(t)−→ (Y,R) {fi}mi=1 D(t)∈R p×m −→ {gi}pi=1. Here we will be interested in dynamical systems of the form ẋ(t) = A(t)x(t) +B(t)u(t) (4.4) y(t) = C(t)x(t) +D(t)u(t) (4.5) where x(t) ∈ Rn, u(t) ∈ Rm, and y(t) ∈ Rp denote the representations of elements of (X,R), (U,R), and (Y,R) with respect to the bases {ei}ni=1, {fi}mi=1, and {gi}pi=1 respectively. To ensure that the system (4.4)–(4.5) is well-posed we will from now on impose following assumption. Assumption 4.1 A(·), B(·), C(·), D(·) and u(·) are piecewise continuous. It is easy to see that this assumption ensures that the solution of (4.4)–(4.5) is well defined. Fact 4.1 For all u(·) : R→ Rm piecewise continuous and all (t0, x0) ∈ R×Rn there exists a unique solution x(·) : R→ Rn and y(·) : R→ Rp for the system (4.4)–(4.5). 1The complex numbers, C, can also be used as the field, at the expense of some additional complication in dimension counting. For simplicity we will think of linear spaces as defined over the field of real numbers, unless otherwise specified (e.g. for eigenvalue calculations). Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 62 Proof:(Sketch) Define p : Rn × R→ Rn by p(x, t) = A(t)x +B(t)u(t). Exercise 4.1 Show that under Assumption 4.1 p satisfies the conditions of Theorem 3.6. (Take D to be the union of the discontinuity sets of A(·), B(·), and u(·)). The existence and uniqueness of x(·) : R→ Rn then follows by Theorem 3.6. Defining y(·) : R→ Rp by y(t) = C(t)x(t) +D(t)u(t) ∀t ∈ R completes the proof. The unique solution of (4.4)–(4.5) defines two functions x(t) = s(t, t0, x0, u) state transition map y(t) = ρ(t, t0, x0, u) output response map mapping the input trajectory u(·) : R→ Rm and initial condition (t0, x0) ∈ R×Rn to the state and output at time t ∈ R respectively. It is easy to see that the function s and the input u also implicitly define the function ρ through ρ(t, t0, x0, u) = C(t)s(t, t0, x0, u) +D(t)u(t). Therefore the main properties of the solution functions can be understood by analysing the properties of the state solution function s. Theorem 4.1 Let Dx be the union of the discontinuity sets of A(·), B(·) and u(·) and Dy the union of the discontinuity sets of C(·), D(·) and u(·). 1. For all (t0, x0) ∈ R× Rn, u(·) ∈ PC(R,Rm) • x(·) = s(·, t0, x0, u) : R→ Rn is continuous and differentiable for all t ∈ R \Dx. • y(·) = ρ(·, t0, x0, u) : R→ Rp is piecewise continuous with discontinuity set Dy. 2. For all t, t0 ∈ R, u(·) ∈ PC(R,Rm), x(·) = s(t, t0, ·, u) : Rn → Rn and ρ(t, t0, ·, u) : Rn → Rp are continuous. 3. For all t, t0 ∈ R, x01, x02 ∈ Rn, u1(·), u2(·) ∈ PC(R,Rm), a1, a2 ∈ R s(t, t0, a1x01 + a2x02, a1u1 + a2u2) = a1s(t, t0, x01, u1) + a2s(t, t0, x02, u2) ρ(t, t0, a1x01 + a2x02, a1u1 + a2u2) = a1ρ(t, t0, x01, u1) + a2ρ(t, t0, x02, u2). 4. For all t, t0 ∈ R, x0 ∈ Rn, u ∈ PC(R,Rm), s(t, t0, x0, u) = s(t, t0, x0, 0) + s(t, t0, 0, u) ρ(t, t0, x0, u) = ρ(t, t0, x0, 0) + ρ(t, t0, 0, u) The last statement requires some care, as 0 is used in two different ways: As the zero element in Rn (θRn = (0, . . . , 0) in the notation of Chapter 3) and as the zero element in the space of piecewise continuous function (θPC(t) = (0, . . . , 0) for all t ∈ R in the notation of Chapter 3). The interpre- tation should hopefully be clear from the location of 0 in the list of arguments of s and ρ. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 63 Proof: Part 1 follows from the definition of the solution. Part 4 follows from Part 3, by setting u1 = 0, u2 = u, x01 = x0, x02 = 0, and a1 = a2 = 1. Part 2 follows from Part 4, by noting that s(t, t0, ·, u) = s(t, t0, ·, 0) + s(t, t0, 0, u) and s(t, t0, ·, 0) : Rn → Rn is a linear function between finite dimensional linear spaces (and hence continuous by Corollary 3.2); the argument for ρ is similar. So we only need to establish Part 3. Let x1(t) = s(t, t0, x01, u1), x2(t) = s(t, t0, x02, u2), x(t) = s(t, t0, a1x01 + a2x02, a1u1 + a2u2), and φ(t) = a1x1(t) + a2x2(t). We would like to show that x(t) = φ(t) for all t ∈ R. By definition x(t0) = a1x01 + a2x02 = a1x1(t0) + a2x2(t0) = φ(t0). Moreover, if we let u(t) = a1u1(t) + a2u(t) then for all t ∈ R \D ẋ(t) = A(t)x(t) +B(t)u(t) φ̇(t) = a1ẋ1(t) + a2ẋ2(t) = a1(A(t)x1(t) +B(t)u1(t)) + a2(A(t)x2(t) +B(t)u2(t)) = A(t)(a1x1(t) + a2x2(t)) +B(t)(a1u1(t) + a2u2(t)) = A(t)φ(t) +B(t)u(t). Therefore x(t) = φ(t) since the solution to the linear ODE is unique. Linearity of ρ follows from the fact that C(t)x+D(t)u is linear in x and u. 4.3 State transition matrix By Part 4 of Theorem 4.1 the solution of the system can be partitioned into two distinct components: s(t, t0, x0, u) = s(t, t0, x0, 0) + s(t, t0, 0, u) state transition = zero input transition + zero state transition ρ(t, t0, x0, u) = ρ(t, t0, x0, 0) + ρ(t, t0, 0, u) output response = zero input response + zero state response. Moreover, by Part 3 of Theorem 4.1, the zero input components s(t, t0, x0, 0) and ρ(t, t0, x0, 0) are linear in x0 ∈ Rn. Therefore, in the basis {ei}ni=1 used for the representation of A(·), the linear map s(t, t0, ·, 0) : Rn → Rn has a matrix representation. This representation, that will of course depend on t and t0 in general, is called the state transition matrix and is denoted by Φ(t, t0). Therefore, assuming s(t, t0, x0, 0) refers to the representation of the solution with respect to the basis {ei}ni=1, s(t, t0, x0, 0) = Φ(t, t0)x0. (4.6) Exercise 4.2 Show that the representation of ρ(t, t0, ·, 0) : Rn → Rp with respect to the bases {ei}ni=1 and {gi}pi=1 is given by C(t)Φ(t, t0); in other words ρ(t, t0, x0, 0) = C(t)Φ(t, t0)x0. Therefore the state transition matrix Φ(t, t0) completely characterizes the zero input state transi- tion and output response. We will soon see that, together with the input trajectory u(·), it also characterizes the complete state transition and output response. Theorem 4.2 Φ(t, t0) has the following properties: 1. Φ(·, t0) : R→ Rn×n is the unique solution of the linear matrix ordinary differential equation ∂ ∂t Φ(t, t0) = A(t)Φ(t, t0) with Φ(t0, t0) = I. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 66 We will show that R(t) satisfies the same differential equation with the same initial condition; the claim then follows by the existence and uniqueness theorem. Note first that R(t0) = Φ(t0, t0)x0 + ∫ t0 t0 Φ(t, τ)B(τ)u(τ)dτ = I · x0 + 0 = x0 = L(t0). Moreover, by the Leibniz rule d dt R(t) = d dt [ Φ(t, t0)x0 + ∫ t t0 Φ(t, τ)B(τ)u(τ)dτ ] = [ d dt Φ(t, t0) ] x0 + d dt [∫ t t0 Φ(t, τ)B(τ)u(τ)dτ ] = A(t)Φ(t, t0)x0 + ∫ t t0 ∂ ∂t Φ(t, τ)B(τ)u(τ)dτ +Φ(t, t)B(t)u(t) d dt t− Φ(t0, t0)B(t0)u(t0) d dt t0 = A(t)Φ(t, t0)x0 + ∫ t t0 A(t)Φ(t, τ)B(τ)u(τ)dτ + I · B(t)u(t) = A(t) [ Φ(t, t0)x0 + ∫ t t0 Φ(t, τ)B(τ)u(τ)dτ ] +B(t)u(t) = A(t)R(t) +B(t)u(t). Therefore R(t) and L(t) satisfy the same linear differential equation for the same initial condition, hence they are equal for all t by uniqueness of solutions. To obtain the formula for ρ(t, t0, x0, u) simply substitute the formula for s(t, t0, x0, u) into y(t) = C(t)x(t) +D(t)u(t). Let us now analyze the zero state transition and response in greater detail. By Theorem 4.1, the zero state transition and the zero state response are both linear functions s(t, t0, 0, ·) : PC(R,Rm)→ Rn and ρ(t, t0, 0, ·) : PC(R,Rm)→ Rp respectively. (PC(R,Rm),R) s(t,t0,0,·)−→ (Rn,R) u(·) 7−→ ∫ t t0 Φ(t, τ)B(τ)u(τ)dτ and (PC(R,Rm),R) ρ(t,t0,0,·)−→ (Rp,R) u(·) 7−→ C(t) ∫ t t0 Φ(t, τ)B(τ)u(τ)dτ +D(t)u(t). Fix the basis {fj}mj=1 for (Rm,R) used in the representation of B(t) ∈ Rn×m. Fix σ ≥ t0 and consider a family of functions δ(σ,ǫ)(·) ∈ PC(R,Rm) parametrized by ǫ > 0 and defined by δ(σ,ǫ)(t) =    0 if t < σ 1 ǫ if σ ≤ t < σ + ǫ 0 if t ≥ σ + ǫ. For j = 1, . . . ,m, consider the zero state transition2 s(t, t0, 0, δ(σ,ǫ)(t)fj) under input δ(σ,ǫ)(t)fj . Since s(t0, t0, 0, δ(σ,ǫ)(t)fj) = 0 and the input is zero until t = σ, s(t, t0, 0, δ(σ,ǫ)(t)fj) = 0 ∀t < σ. 2Strictly speaking, we should use the Rm representation of the basis vector fi ∈ U instead of fi itself. The reader is asked to excuse this slight abuse of the notation. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 67 Exercise 4.3 Show this by invoking the existence-uniqueness theorem. For t ≥ σ + ǫ and assuming ǫ is small s(t, t0, 0, δ(σ,ǫ)(t)fj) = ∫ t t0 Φ(t, τ)B(τ)δ(σ,ǫ)(τ)fjdτ = ∫ σ+ǫ σ Φ(t, τ)B(τ) 1 ǫ fjdτ = 1 ǫ ∫ σ+ǫ σ Φ(t, σ + ǫ)Φ(σ + ǫ, τ)B(τ)fjdτ = Φ(t, σ + ǫ) ǫ ∫ σ+ǫ σ Φ(σ + ǫ, τ)B(τ)fjdτ ≈ Φ(t, σ + ǫ) ǫ [Φ(σ + ǫ, σ)B(σ)fj ] ǫ ǫ→0−→ Φ(t, σ)Φ(σ, σ)B(σ)fj = Φ(t, σ)B(σ)fj Therefore lim ǫ→0 s(t, t0, 0, δ(σ,ǫ)(t)fj) = Φ(t, σ)B(σ)fj . Formally, if we pass the limit inside the function s and define δσ(t) = lim ǫ→0 δ(σ,ǫ)(t) we obtain s(t, t0, 0, δσ(t)fj) = Φ(t, σ)B(σ)fj ∈ Rm. The statement is “formal” since to pass the limit inside the function we first need to ensure that the function is continuous. Exercise 4.4 We already know that the function s(t, t0, 0, ·) : PC(R,Rm) → Rn is linear. What more do we need to check to make sure that it is continuous? Moreover, strictly speaking δσ(t) is not an acceptable input function, since it is equal to infinity at t = σ and hence not piecewise continuous. Indeed, δσ(t) is not a real valued function at all, it just serves as a mathematical abstraction for an input pulse of arbitrarily small length. This mathematical abstraction is known as the impulse function or the Dirac pulse. Even though in practice the response of a real system to such an impulsive input cannot be observed, by applying as input piecewise continuous functions δ(σ,ǫ)(t) for ǫ small enough one can approximate the state transition s(t, t0, 0, δσ(t)fj) arbitrarily closely. Notice also that for t ≥ σ s(t, t0, 0, δσ(t)fj) = s(t, σ, B(σ)fj , 0) i.e. the zero state transition due to the impulse δσ(t)fj is also a zero input transition starting with state B(σ)fj at time σ. Repeating the process for all {fj}mj=1 leads to m vectors Φ(t, σ)B(σ)fj for j = 1, . . . ,m. Order- ing these vectors according to their index j and putting them one next to the other leads to the impulse transition matrix, K(t, σ) ∈ Rn×m, defined by K(t, σ) = { Φ(t, σ)B(σ) if t ≥ σ 0 if t < σ. The (i, j) element of K(t, σ) contains the trajectory of state xi when the impulse function δσ(t) is applied to input uj . Note that, even though these elements cannot be measured in practice, the impulse transition matrix is still a well defined, matrix valued function for all t, σ ∈ R. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 68 Substituting s(t, 0, 0, δσ(t)fj) into the equation y(t) = C(t)x(t) +D(t)u(t) leads to ρ(t, t0, 0, δσ(t)fj) = C(t)Φ(t, σ)B(σ)fj +D(t)fjδσ(t) ∈ Rm. and the impulse response matrix, H(t, σ) ∈ Rp×m, defined by H(t, σ) = { C(t)Φ(t, σ)B(σ) +D(t)δσ(t) if t ≥ σ 0 if t < σ. Note that, unlike K(t, σ), H(t, σ) is in general not a well defined matrix valued function since it in general contains an impulse in its definition (unless of course D(t) = 0 for all t ∈ R). Problems for chapter 4 Problem 4.1 (Invariant Subspaces) Let L : V → V be a linear map on an n-dimensional vector space V over the field F . Recall that a subspace M ⊂ V is called L-invariant if Lx ∈ M for every x ∈ M . Suppose that V is a direct sum of two subspaces M1 and M2, i.e., M1 ∩M2 = {0}, and M1 +M2 = V . If both M1 and M2 are L-invariant, show that there exists a matrix representation A ∈ Fn×n of the form A = [ A11 0 0 A22 ] with Dim(A11) = Dim(M1) and Dim(A22) = Dim(M2). (Recall that the sum of subspaces M and N of a vector space X is the set of all vectors of the form m + n where m ∈ M and n ∈ N . A vector space X is the direct sum of two subspaces M and N if every vector x ∈ X has a unique prepresentation of the form x = m+ n where m ∈M and n ∈ N ; we write X =M ⊕N .) Problem 4.2 (Eigenvalues and Invariant Subspaces) Let A be a real-valued n × n matrix. Suppose that λ+iµ is a complex eigenvalue of A and x+iy is the corresponding complex eigenvector, where λ, µ ∈ R and x, y ∈ Rn. Show that x − iy is also an eigenvector with eigenvalue λ − iµ. Let V be the 2-dimensional subspace spanned by x and y, i.e., V is the set of linear combinations with real-valued coefficients of the real-valued vectors x and y. Show that V is an invariant subspace of A, namely, if z ∈ V then we have Az ∈ V . Problem 4.3 (ODEs) 1. Let f : Rn → Rn be Lipschitz with Lipschitz constant K ∈ [0,+∞). For t ∈ R, let x(t) be the solution of ẋ(t) = f(x(t)), with x(0) = x0. Let x̄ ∈ Rn be such that f(x̄) = 0. Show that ||x(t)− x̄|| ≤ eKt||x0 − x̄||, ∀t ∈ R+. (Here || · || is the Euclidean norm on (Rn,R) for which K is defined. Hint: use the Gronwall Lemma). 2. Let A : R+ → Rn×n, B : R+ → Rn×m and u : R+ → Rm be piecewise continuous functions. Show that, for any x0 ∈ Rn, the linear ODE { ẋ(t) = A(t)x(t) +B(t)u(t), t ∈ R+, x(0) = x0 has a unique solution x(·) : R+ → Rn. (Hint: you may assume that if A(t) is piecewise continuous then so is its induced norm). Problem 4.4 (Linear ODEs) Let A(·) : R → Rn×n be piecewise continuous. Consider the fol- lowing linear ODE: ẋ(t) = A(t)x(t), (4.8) and let Φ(t, t0) be the state transition matrix. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 71 3. For all t, t0 ∈ R, x0 ∈ Rn, u(·) ∈ PC(R,Rm), s(t, t0, x0, u) = eA(t−t0)x0 + ∫ t t0 eA(t−τ)Bu(τ)dτ ρ(t, t0, x0, u) = CeA(t−t0)x0 + C ∫ t t0 eA(t−τ)Bu(τ)dτ +Du(t). 4. For all t, σ ∈ R the K(t, σ) = K(t− σ, 0) = { eA(t−σ)B if t ≥ σ 0 if t < σ. H(t, σ) = H(t− σ, 0) = { CeA(t−σ)B +Dδ0(t− σ) if t ≥ σ 0 if t < σ. From the above it becomes clear that for linear time invariant systems the solution is independent of the initial time t0; all that matters is how much time has elapsed since then, i.e. t− t0. Without loss of generality we will therefore take t0 = 0 and write x(t) = s(t, 0, x0, u) = eAtx0 + ∫ t 0 eA(t−τ)B(τ)u(τ)dτ y(t) = ρ(t, 0, x0, u) = CeAtx0 + C ∫ t 0 eA(t−τ)B(τ)u(τ)dτ +Du(t) K(t) = K(t, 0) = { eAtB if t ≥ 0 0 if t < 0. H(t) = H(t, 0) = { CeAtB +Dδ0(t) if t ≥ 0 0 if t < 0. Notice that in this case the integral that appears in the state transition and output response is simply the convolution of the input u(·) with the impulse transition and impulse response matrices respectively, x(t) = eAtx0 + (K ∗ u)(t) y(t) = CeAtx0 + (H ∗ u)(t). Exercise 5.1 Verify this. 5.2 Semi-simple matrices For the time invariant case, the state transition matrix eAt can be computed explicitly. In some cases this can be done directly from the infinite series; this is the case, for example, for nilpotent matrices, i.e. matrices for which there exists N ∈ N such that AN = 0; conditions to determine when this is the case will be given in Section 5.4. More generally, one can use Laplace transforms or eigenvectors to do this. We concentrate primarily on the latter method; Laplace transforms will be briefly discussed in Section 5.4. We start with the simpler case of the so-called semi-simple matrices. Definition 5.1 A matrix A ∈ Rn×n is semi-simple if and only if its right eigenvectors {vi}ni=1 ⊆ Cn are linearly independent in the linear space (Cn,C). Theorem 5.2 Matrix A ∈ Rn×n is semi-simple if and only if there exists a nonsingular matrix T ∈ Cn×n and a diagonal matrix Λ ∈ Cn×n such that A = T−1ΛT . Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 72 Proof: (⇒) Recall that if {vi}ni=1 are linearly independent then the matrix [v1 . . . vn] ∈ Cn×n is invertible. Let T = [v1 v2 . . . vn] −1 ∈ Cn×n Then AT−1 = [Av1 Av2 . . . Avn] = [λ1v1 λ2v2 . . . λnvn] = T−1Λ where λi ∈ C are the corresponding eigenvalues. Multiplying on the right by T leads to A = T−1ΛT. (⇐) Assume that there exists matrices T ∈ Cn×n nonsingular and Λ ∈ Cn×n diagonal such that A = T−1ΛT ⇒ AT−1 = T−1Λ. Let T−1 = [w1 . . . wn] where wi ∈ Cn denoted the ith column of T−1 and Λ =   σ1 . . . 0 ... . . . ... 0 . . . σn   with σi ∈ C. Then Awi = σiwi and therefore wi is a right eigenvector of A with eigenvalue σi. Since T−1 is invertible its columns (and eigenvectors of A) {wi}ni=1 are linearly independent. Now let us see how this fact helps is compute the state transition matrix eAt. Recall that eAt is related to the solution of the differential equation ẋ(t) = Ax(t); in particular the solution of the differential equation starting at x(0) = x0 can be written as x(t) = eAtx0. Recall also that A ∈ Rn×n can be thought of as the representation of some linear operator A : Rn → Rn with respect to some (usually the canonical) basis {ei}ni=1. If A is semi-simple, then its eigenvectors are linearly independent and can also be used as a basis. Let us see what the representation of the linear operator A with respect to this basis is. (Cn,C) A−→ (Cn,C) {ei}ni=1 A∈R n×n −→ {ei}ni=1 (basis leading to representation of A by A) {vi}ni=1 Ã=TAT−1=Λ−→ {vi}ni=1 (eigenvector basis) Recall that if x = (x1, . . . , xn) ∈ Rn is the representation of x with respect to the basis {ei}ni=1 its representation with respect to the complex basis {vi}ni=1 will be the complex vector x̃ = Tx ∈ Cn. The above formula simply states that ˙̃x = T ẋ = TAx = TAT−1x̃ = Λx̃. Therefore, if A is semi-simple, its representation with respect to the basis of its eigenvectors is the diagonal matrix Λ of its eigenvalues. Notice that even though A is a real matrix, its representation Λ is in general complex, since the basis {vi}ni=1 is also complex. What about the state transition matrix? Fact 5.1 If A is semi-simple eAt = T−1eΛtT = T−1   eλ1t . . . 0 ... . . . ... 0 . . . eλnt  T (5.4) Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 73 Proof: Exercise. Show that Ak = T−1ΛkT and substitute into the expansion (5.3). In other words: (Rn,R) s(t,0,·,θU)−→ (Rn,R) x0 7−→ x(t) {ei}ni=1 eAt∈R n×n −→ {ei}ni=1 (basis leading to representation of A by A) {vi}ni=1 eΛt=TeAtT−1 −→ {vi}ni=1 (eigenvector basis). Once again, note that the matrices T , T−1 and eΛt will general be complex. Fact 5.1, however, shows that when taking their product the imaginary parts will all cancel and we will be left with a real matrix. Fact 5.1 shows that if a matrix is semi-simple the calculation of the matrix exponential is rather straightforward. It would therefore be desirable to establish conditions under which a matrix is semisimple. Definition 5.2 A matrix A ∈ Rn×n is simple if and only if its eigenvalues are distinct, i.e. λi 6= λj for all i 6= j. Theorem 5.3 All simple matrices are semi-simple. Proof: Assume, for the sake of contradiction, that A ∈ Rn×n is simple, but not semi-simple. Then λi 6= λj for all i 6= j but {vi}ni=1 are linearly dependent in (Cn,C). Hence, there exist a1, . . . , an ∈ C not all zero, such that n∑ i=1 aivi = 0. Without loss of generality, assume that a1 6= 0 and multiply the above identity by (A − λ2I)(A − λ3I) . . . (A− λnI) on the left. Then a1(A− λ2I)(A− λ3I) . . . (A− λnI)v1 + n∑ i=2 ai(A− λ2I)(A− λ3I) . . . (A− λnI)vi = 0. Concentrating on the first product and unraveling it from the right leads to a1(A− λ2I)(A− λ3I) . . . (A− λnI)v1 =a1(A− λ2I)(A − λ3I) . . . (Av1 − λnv1) =a1(A− λ2I)(A − λ3I) . . . (λ1v1 − λnv1) =a1(A− λ2I)(A − λ3I) . . . v1(λ1 − λn) =a1(A− λ2I)(A − λ3I) . . . (Av1 − λn−1v1)(λ1 − λn) etc. which leads to a1(λ1 − λ2)(λ1 − λ3) . . . (λ1 − λn)v1 + n∑ i=2 ai(λi − λ2)(λi − λ3) . . . (λi − λn)vi = 0. Each term of the sum on the right will contain a term of the form (λi − λi) = 0. Hence the sum on the right is zero, leading to a1(λ1 − λ2)(λ1 − λ3) . . . (λ1 − λn)v1 = 0. But, v1 6= 0 (since it is an eigenvector) and a1 6= 0 (by the assumption that {vi}ni=1 are linearly dependent) and (λ1 − λi) 6= 0 for i = 2, . . . , n (since the eigenvalues are distinct). This leads to a contradiction. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 76 Notice that in all three cases the generalized eigenvectors are the same, but they are partitioned differently into chains. Note also that in all three cases the generalized eigenvectors taken all together form a linearly independent family of n = 3 vectors. It can be shown that the last observation is not a coincidence: The collection of all generalised eigenvectors is always linearly independent. Lemma 5.1 Assume that the matrix A ∈ Rn×n has k linearly independent eigenvectors v1, . . . , vk ∈ Cn with corresponding maximal Jordan chains {vji }µi j=0 ⊆ Cn, i = 1, . . . , k. Then the matrix[ v11 . . . vµ1 1 . . . v1k . . . vµk k . . . ] ∈ Cn×n is invertible. In particular, ∑k i=1 µi = n. The proof of this fact is rather tedious and will be omitted, see [19]. Consider now a change of basis T = [ v11 . . . vµ1 1 v12 . . . vµ2 2 . . . ]−1 ∈ Cn×n (5.5) comprising the generalised eigenvectors as the columns of the matrix T−1. Theorem 5.4 With the definition of T in equation (5.5), the matrix A ∈ Rn×n can be written as A = T−1JT where J ∈ Cn×n is block-diagonal J =   J1 0 . . . 0 0 J2 . . . 0 ... ... . . . ... 0 0 . . . Jk   ∈ Cn×n, Ji =   λi 1 0 . . . 0 0 0 λi 1 . . . 0 0 ... ... ... . . . ... ... 0 0 0 . . . λi 1 0 0 0 . . . 0 λi   ∈ Cµi×µi , i = 1, . . . , k and λi ∈ C is the eigenvalue corresponding to the Jordan chain {vji }µi j=0. Notice that there may be multiple Jordan chains for the same eigenvalue λ, in fact their number will be the same as the number of linearly independent eigenvectors associated with λ. If k = n (equivalently, all Jordan chains have length 1) then the matrix is semi-simple, T−1 is the matrix of eigenvectors of A, and J = Λ. The theorem demonstrates that any matrix can be brought into a special, block diagonal form using its generalised eigenvectors as a change of basis. This special block diagonal form is known as the Jordan canonical form. Definition 5.5 The block diagonal matrix J in Theorem 5.4 is the called the Jordan canonical form of the matrix A. The matrices Ji are known as the Jordan blocks of the matrix A. Example (Non semi-simple matrices (cont.)) In the above example, the three matrices A1, A2 and A3 are already in Jordan canonical form. A1 comprises one Jordan block of size 3, A2 two Jordan blocks of sizes 2 and 1 and A3 three Jordan blocks, each of size 1. How does this help with the computation of eAt? Theorem 5.5 eAt = T−1eJtT where eJt =   eJ1t 0 . . . 0 0 eJ2t . . . 0 ... ... . . . ... 0 0 . . . eJkt   and eJit =   eλit teλit t2eλit 2! . . . tµi−1eλit (µi−1)! 0 eλit teλit . . . tµi−2eλit (µi−2)! ... ... ... . . . ... 0 0 0 . . . eλit   , i = 1, . . . , k. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 77 Proof: Exercise. Show that Aj = T−1JjT , then that Jj =   Jj 1 0 . . . 0 0 Jj 2 . . . 0 ... ... . . . ... 0 0 . . . Jj k   , and hence that eJt =   eJ1t 0 . . . 0 0 eJ2t . . . 0 ... ... . . . ... 0 0 . . . eJkt   . Finally show that eJit =   eλit teλit t2eλit 2! . . . tµi−1eλit (µi−1)! 0 eλit teλit . . . tµi−2eλit (µi−2)! ... ... ... . . . ... 0 0 0 . . . eλit   by differentiating with respect to t, showing that the result is equal to Jie Jit and invoking uniqueness (or in a number of other ways). So the computation of the matrix exponential becomes easy once again. Notice that if k = n then we are back to the semi-simple case. As in the semi-simple case, all matrices involved in the product will in general be complex. However, the theorem ensures that when taking the product the imaginary parts will cancel and the result will be a real matrix. Example (Non semi-simple matrices (cont.)) In the above example, eA1t =   eλt teλt t2 2 e λt 0 eλt teλt 0 0 eλt   , eA2t =   eλt teλt 0 0 eλt 0 0 0 eλt   , eA3t =   eλt 0 0 0 eλt 0 0 0 eλt   , Notice that in all cases eAt consists of linear combinations of elements of the form eλit, teλit, . . . , tµi−1eλit for λi ∈ Spec[A] and µi the length of the longest Jordan chain at λi. In other words eAt = ∑ λ∈Spec[A] Πλ(t)e λt (5.6) where for each λ ∈ Spec[A], Πλ(t) ∈ C[t]n×n is a matrix of polynomials of t with complex coefficients and degree at most equal to the length of the longest Jordan chain at λ. In particular, if A is semi- simple all Jordan chains have length equal to 1 and the matrix exponential reduces to eAt = ∑ λ∈Spec[A] Πλe λt where Πλ ∈ Cn×n are constant complex matrices. Notice again that even though in general both the eigenvalues λ and the coefficients of the corresponding polynomials Πλ(t) are complex numbers, because the eigenvalues appear in complex conjugate pairs the imaginary parts for the sum cancel out and the result is the real matrix eAt ∈ Rn×n. 5.4 Laplace transforms To establish a connection to more conventional control notation, we recall the definition of the Laplace transform of a signal f(·) : R+ → Rn×m mapping the non-negative real numbers to the Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 78 linear space of n×m real matrices: F (s) = L{f(t)} = ∫ ∞ 0 f(t)e−stdt ∈ Cn×m where the integral is interpreted element by element and we assume that it is well defined (for a careful discussion of this point see, for example, [7]). The Laplace transform L{f(t)} transforms the real matrix valued function f(t) ∈ Rn×m of the real number t ∈ R+ to the complex matrix valued function F (s) ∈ Cn×m of the complex number s ∈ C. The inverse Laplace transform L−1{F (s)} performs the inverse operation; it can also be expressed as an integral, even though in the calculations considered here one mostly encounters functions F (s) that are recognisable Laplace transforms of known functions f(t); in particular the functions F (s) will typically be proper rational functions of s whose inverse Laplace transform can be computed by partial fraction expansion. Fact 5.3 The Laplace transform (assuming that it is defined for all functions concerned) has the following properties: 1. It is linear, i.e. for all A1, A2 ∈ Rp×n and all f1(·) : R+ → Rn×m, f2(·) : R+ → Rn×m L{A1f1(t) +A2f2(t)} = A1L{f1(t)} +A2L{f2(t)} = A1F1(s) +A2F2(s) 2. L { d dtf(t) } = sF (s)− f(0). 3. L{(f ∗ g)(t)} = F (s)G(s) where (f ∗ g)(·) : R+ → Rp×m denotes the convolution of f(·) : R+ → Rp×n and g(·) : R+ → Rn×m defined by (f ∗ g)(t) = ∫ t 0 f(t− τ)g(τ)dτ. Proof: Exercise, just use the definition and elementary calculus. Fact 5.4 For all A ∈ Rn×n and t ∈ R+, L { eAt } = (sI −A)−1. Proof: Recall that d dt eAt = AeAt ⇒ L { d dt eAt } = L { AeAt } ⇒ sL { eAt } − eA0 = AL { eAt } ⇒ sL { eAt } − I = AL { eAt } ⇒ (sI −A)L { eAt } = I. The claim follows by multiplying on the left by (sI − A)−1; notice that the matrix is invertible for all s ∈ C, except the eigenvalues of A. Let us look somewhat more closely to the structure of the Laplace transform, (sI − A)−1, of the state transition matrix. This is an n× n matrix of strictly proper rational functions of s, which as we saw in Chapter 2 form a sub-ring of (Rp(s),+, ·). To see this recall that by definition (sI −A)−1 = Adj[sI −A] Det[sI −A] The denominator is simply the characteristic polynomial, χA(s) = sn + χ1s n−1 + . . .+ χn ∈ R[s] Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 81 with the input u(t). Taking Laplace transforms we obtain X(s) = L{x(t)} =L{eAtx0 + (K ∗ u)(t)} =L{eAt}x0 + L{(K ∗ u)(t)} =(sI −A)−1x0 + L{eAtB}L{u(t)} =(sI −A)−1x0 + (sI −A)−1BU(s) which coincides with equation (5.8), as expected. Equation (5.8) provides an indirect, purely algebraic way of computing the solution of the differential equation, without having to compute a matrix exponential or a convolution integral. One can form the Laplace transform of the solution, X(s), by inverting (sI − A) (using, for example, the matrix multiplication of Theorem 5.6) and substituting into equation (5.8). From there the solution x(t) can be computed by taking an inverse Laplace transform. Since (sI−A)−1 ∈ Rp(s) n×n is a matrix of strictly proper rational functions, X(s) ∈ Rp(s) n will be a vector of strictly proper rational functions, with the characteristic polynomial in the denominator. The inverse Laplace transform can therefore be computed by partial fraction expansions, at least for many reasonable input functions (constants, sines and cosines, exponentials, ramps, polynomials, and combinations thereof). Taking the Laplace transform of the output equation leads to y(t) = Cx(t) +Du(t) L =⇒ Y (s) = CX(s) +DU(s). By (5.8) Y (s) = C(sI −A)−1x0 + (sI −A)−1BU(s) +DU(s) which for x0 = 0 (zero state response) reduces to Y (s) = C(sI −A)−1BU(s) +DU(s) = G(s)U(s). (5.9) Definition 5.7 The function G(·) : C→ Cp×m defined by G(s) = C(sI −A)−1B +D (5.10) is called the transfer function of the system. Comparing equation (5.9) with the zero state response that we computed earlier y(t) = C ∫ t 0 eA(t−τ)Bu(τ)dτ +Du(t) = (H ∗ u)(t) it is clear that the transfer function is the Laplace transform of the impulse response H(t) of the system G(s) = L{H(t)} = L{CeAtB +Dδ0(t)} = C(sI −A)−1B +D. Substituting equation (5.7) into (5.10) we obtain G(s) = C M(s) χA(s) B +D = CM(s)B +DχA(s) χA(s) . (5.11) Since K(s) is a matrix of polynomials of degree at most n− 1 and χA(s) is a polynomial of degree n we see that G(s) ∈ Rp(s) p×m is a matrix of proper rational functions. If moreover D = 0 then the rational functions are strictly proper. Definition 5.8 The poles of the system are the values of s ∈ C are the roots of the denominator polynomial of G(s). Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 82 From equation (5.11) it becomes apparent that all poles of the system are eigenvalues (i.e. are contained in the spectrum) of the matrix A. Note, however, that not all eigenvalues of A are necessarily poles of the system, since there may be cancellations of common factors in the numerator and denominator when forming the fraction (5.11). It turns out that such cancellations are related to the controllability and observability properties of the system. We will return to this point in Chapter 8, after introducing these notions. Problems for chapter 5 Problem 5.1 (Change of basis) Let {ui}mi=1, {xi}ni=1, {yi}pi=1 be bases of the linear spaces (R m,R), (Rn,R) and (Rp,R), respectively. Let u(·) : R+ → Rm be piecewise continuous. For t ∈ R+, consider the linear time-invariant system: { ẋ(t) = Ax(t) +Bu(t), y(t) = Cx(t) +Du(t), (5.12) with x(t0) = x0, where all matrix representations are given w.r.t. {ui}mi=1, {xi}ni=1, {yi}pi=1. Now let {x̃i}ni=1 be another basis of (Rn,R) and let T ∈ Rn×n represent the change of basis from {xi}ni=1 to {x̃i}ni=1. 1. Derive the representation of the system w.r.t. bases {ui}mi=1, {x̃i}ni=1, {yi}pi=1. 2. Compute the transition map Φ̃(t, t0) and the impulse response matrix H̃(t, τ) w.r.t. the new representation. How do they compare with the corresponding quantities Φ(t, t0) and H(t, τ) in the original representation? Problem 5.2 (Time-invariant Systems) Consider the linear time-invariant system of Problem 5.1. 1. Show that Φ(t, t0) = exp ( A(t− t0) ) = ∞∑ k=0 ( A(t− t0) )k k! . 2. Given two matricesA1, A2 ∈ Rn×n show that, if A1A2 = A2A1, then A2 exp(A1t) = exp(A1t)A2 and exp ( (A1 + A2)t ) = exp(A1t) exp(A2t). Also show that these properties may not hold if A1A2 6= A2A1. 3. Show that the impulse response matrix satisfies H(t, τ) = H(t− τ, 0) (i.e. it depends only on the difference t− τ). Problem 5.3 (Discretization of Continuous-time Systems) 1. Consider the time-varying linear system ẋ(t) = A(t)x(t) +B(t)u(t), y(t) = C(t)x(t) +D(t)u(t), (⊳) with initial condition x(t0) = x0 ∈ Rn, t ≥ t0. Consider a set of time instants tk, with k = 0, 1, 2, . . . , such that tk < tk+1 for all k. Let u(t) be constant between subsequent time instants: u(t) = uk ∀k ∈ N, ∀t ∈ [tk, tk+1). Let x̄k+1 = x(tk+1) and ȳk+1 = y(tk+1) be the state and the output of system (⊳) sampled at times tk. Show that there exist matrices Āk, B̄k, C̄k and D̄k such that x̄k+1 = Ākx̄k + B̄kuk, ȳk = C̄kx̄k + D̄kuk. () Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 83 2. Now assume that (⊳) is time-invariant, i.e. ( A(t), B(t), C(t), D(t) ) = (A,B,C,D), ∀t ≥ t0, and that there exists a fixed T > 0, tk+1 − tk = T , ∀k. Provide simplified expressions for Āk, B̄k, C̄k and D̄k and show that they are independent of k. Problem 5.4 (Realization) Consider the following n-th order scalar differential equation with constant coefficients: y(n)(t) + a1y (n−1)(t) + . . .+ an−1y (1)(t) + any(t) = u(t), t ∈ R+, (5.13) where y(i)(t) denotes the i-th derivative of y at t, {ai} ⊂ R and u(·) : R+ → R is a piecewise continuous input. Show that (5.13) can be put in the form (5.12) for an appropriate definition of the state x(t) ∈ Rn and of matrices A,B,C,D. Problem 5.5 (Jordan Blocks) Let λ, λ1, λ2 ∈ F , with F = R or F = C. Compute exp(A · t) for the following definitions of matrix A: 1. A = [ λ1 0 0 λ2 ] ; 2. A = [ λ 1 0 λ ] ; 3. A =   λ 1 . . . . . . λ 1 λ   ∈ F n×n, where the elements not shown are zeroes. (Hint: in 3, consider the decomposition A = λI +N , and make use of (λI +N)k = ∑k i=0 k! i!(k−i)! (λI) i ·Nk−i.) Problem 5.6 (Jordan blocks and matrix exponential) For i = 1, . . . ,m, let Λi ∈ Cni×ni . De- fine n = n1 + . . .+ nm and the block diagonal matrix Λ = diag(Λ1,Λ2, . . . ,Λm) =   Λ1 Λ2 . . . Λm   ∈ Rn×n. (◦) 1. Show that exp(Λ) = diag ( exp(Λ1), exp(Λ2), . . . , exp(Λm) ) . 2. Compute exp(Λi) for the following definitions of Λi: (assume each entry is real) (a) Λi =   λ1 . . . λni   ; (b) Λi =   λ 1 . . . . . . λ 1 λ   ; (c) Λi = [ ω σ −σ ω ] ∈ R2×2, where the elements not shown are zeroes. [Hint: in (b), consider the decomposition Λ = λI+N , and make use of (λI +N)k = ∑k i=0 k! i!(k−i)! (λI) i ·Nk−i.] 3. Assume that v = x+ iy and v∗ = x− iy, with x, y ∈ Rn, are complex eigenvectors of a matrix A ∈ Rn×n, i.e. Av = λv and Av∗ = λ∗v∗ for some λ = σ + iω with σ, ω ∈ R. Let (◦), with Λ1 ∈ C2×2, be the Jordan decomposition of A ∈ Rn×n corresponding to a basis of the form {v, v∗, v3, v4, . . . vn}. (a) Write the expression of Λ1. (b) Find a new basis and the corresponding change of basis T such that TΛT−1 = diag(Λ̃1,Λ2, . . . ,Λm) with Λ̃1 ∈ R2×2 (real). What is the expression of Λ̃1? Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 86 Proof: Note that for the proposed solution s(t0, t0, x̂) = x̂ and d dt s(t, t0, x̂) = d dt x̂ = 0 = p(x̂, t) = p(s(t, t0, x̂), t). The conclusion follows by existence and uniqueness of solutions. The fact shows that a solution which passes through an equilibrium, x̂, at some point in time is forced to stay on the equilibrium for all times. We call this constant solution the equilibrium solution defined by the equilibrium x̂. What if a solution passes close to the equilibrium, but not exactly through it? Clearly such a solution will no longer be identically equal to the equilibrium, but will it move away from the equilibrium, or will it remain close to it? Will it converge to the equilibrium and if so at what rate? To answer these questions we first need to fix a norm on Rn to be able to measure distances. Any norm will do since they are all equivalent, for simplicity we will use the Euclidean norm throughout. Equiped with this norm, we can now formalize the above questions in the following definition. Definition 6.2 Let x̂ ∈ Rn be an equilibrium of system (6.2). This equilibrium is called: 1. Stable if and only if for all t0 ∈ R, and all ǫ > 0, there exists δ > 0 such that ‖x0 − x̂‖ < δ ⇒ ‖s(t, t0, x0)− x̂‖ < ǫ, ∀t ≥ t0. 2. Unstable if and only if it is not stable. 3. Uniformly stable if and only if for all ǫ > 0 there exists δ > 0 such that for all t0 ∈ R ‖x0 − x̂‖ < δ ⇒ ‖s(t, t0, x0)− x̂‖ < ǫ, ∀t ≥ t0. 4. Locally asymptotically stable if and only if it is stable and for all t0 ∈ R there exists M > 0 such that ‖x0 − x̂‖ ≤M ⇒ lim t→∞ ‖s(t, t0, x0)− x̂‖ = 0. 5. Globally asymptotically stable if and only if it is stable and for all (t0, x0) ∈ R× Rn lim t→∞ ‖s(t, t0, x0)− x̂‖ = 0. 6. Locally exponentially stable if and only if for all t0 ∈ R there exist α,m,M > 0 such that for all all t ≥ t0 ‖x0 − x̂‖ ≤M ⇒ ‖s(t, t0, x0)− x̂‖ ≤ m‖x0 − x̂‖e−α(t−t0). 7. Globally exponentially stable if and only if for all t0 ∈ R there exist α,m > 0 such that for all x0 ∈ Rn and all t ≥ t0 ‖s(t, t0, x0)− x̂‖ ≤ m‖x0 − x̂‖e−α(t−t0). Special care is needed in the above definition: The order of the quantifiers is very important. Note for example that the definition of stability implicitly allows δ to depend on t0 and ǫ; one sometimes writes δ(t0, ǫ) to highlight this dependence. On the other hand, in the definition of uniform stability δ can depend on ǫ but not on t0, i.e. the same δ must work for all t0; one sometimes uses the notation δ(ǫ) to highlight this fact. Likewise, the definition of global exponential stability requires α and m to be independent of x0, i.e. the same α and m must work for all x0 ∈ Rn; a variant of this definition where m and α are allowed to depend on x0 is sometimes referred to as semi-global exponential stability. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 87 The definition distinguishes stability concepts along three axes. The most fundamental distinction deals with the convergence of nearby solutions to the equilibrium. The equilibrium is called unstable if we cannot keep solutions close to it by starting sufficiently close, stable if we can keep solutions as close as we want by starting them sufficiently close, asymptotically stable if in addition nearby solutions converge to the equilibrium, and exponentially stable if they do so at an exponential rate. The second distinction deals with how these properties depend on the starting time, t0: For uniform stability the starting time is irrelevant, the property holds the same way irrespective of when we look at the system. The third distinction deals with how these properties depend on the starting state, x0: “Local” implies that the property holds provided we start close enough to the equilibrium, whereas global requires that the property holds irrespective of where we start. Note that this distinction is irrelevant for stability and uniform stability, since the conditions listed in the definition are required to hold provided we start close enough. One can also combinatorially mix these qualities to define other variants of stability notions: Uniform local asymptotic stability (where the equilibrium is uniformly stable and the convergence rate is independent of the starting time), uniform global exponential stability, etc. We will not pursue these variants of the definitions here, since most of them turn out to be irrelevant when dealing with linear systems. It is easy to see that the notions of stability introduced in Definition 6.2 are progressively stronger. Fact 6.2 Consider an equilibrium of system (6.2). Then the following statements are true: If the equilibrium is then it is also uniformly stable stable locally asymptotically stable stable globally asymptotically stable locally asymptotically stable locally exponentially stable locally asymptotically stable globally exponentally stable globally asymptotically stable globally exponentally stable locally exponentially stable Proof: Most of the statements are obvious from the definition. Asymptotic stability requires stability, global asymptotic stability implies that the conditions of local asymptotic stability hold for any M > 0, etc. The only part that requires any work is showing that local/global exponential stability implies local/global asymptotic stability. Consider a globally1 exponentially stable equilibrium x̂, i.e. assume that for all t0 there exist α,m > 0 such that for all x0, ‖s(t, t0, x0)− x̂‖ ≤ m‖x0 − x̂‖e−α(t−t0) for all t ≥ t0. For t0 ∈ R and ǫ > 0 take δ = ǫ/m. Then for all x0 ∈ Rn such that ‖x0 − x̂‖ < δ and all t ≥ t0 ‖s(t, t0, x0)− x̂‖ ≤ m‖x0 − x̂‖e−α(t−t0) < mδe−α(t−t0) = ǫe−α(t−t0) ≤ ǫ. Hence the equilibrium is stable. Moreover, since by the properties of the norm ‖s(t, t0, x0)− x̂‖ ≥ 0, 0 ≤ lim t→∞ ‖s(t, t0, x0)− x̂0‖ ≤ lim t→∞ m‖x0 − x̂‖e−α(t−t0) = 0. Hence limt→∞ ‖s(t, t0, x0)− x̂‖ = 0 and the equilibrium is asymptotically stable. It is also easy to see that the stability notions of Definition 6.2 are strictly stronger one from the other; in other words the converse implications in the table of Fact 6.2 are in general not true. We show this through a series of counter-examples. Example (Stable, non-uniformly stable equilibrium) For x(t) ∈ R consider the linear, time varying system ẋ(t) = − 2t 1 + t2 x(t) (6.3) 1The argument for local exponential stability is effectively the same. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 88 t -2 0 2 4 6 8 10 x( t) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Figure 6.1: Three trajectories of the linear time varying system of equation (6.3) with initial condition x(t0) = 1 and t0 = 0, −1 and −2 respectively. Exercise 6.1 Show that the system has a unique equilibrium at x̂ = 0. Show further that s(t, t0, x0) = 1 + t20 1 + t2 x0 by differentiating and invoking existence and uniqueness of solutions. Typical trajectories of the system for x0 = 1 and different values of t0 are shown in Figure 6.1. It is easy to see that x̂ = 0 is a stable equilibrium. Indeed, given t0 ∈ R and ǫ > 0 let δ = ǫ/(1+ t20). Then for all x0 ∈ Rn such that ‖x0‖ < δ, ‖s(t, t0, x0)‖ = ∥∥∥∥ 1 + t20 1 + t2 x0 ∥∥∥∥ < ǫ 1 + t2 ≤ ǫ. However, the equilibrium is not uniformly stable: For a given ǫ we cannot find a δ that works for all t0 ∈ R. To see this, notice that for t0 ≤ 0, ‖s(t, t0, x0)‖ reaches a maximum of (1+ t20)‖x0‖ at t = 0. Hence to ensure that ‖s(t, t0, x0)‖ < ǫ we need to ensure that (1 + t20)‖x0‖ < ǫ which is impossible to do by restricting x0 alone; for any 0 < δ < ǫ and ‖x0‖ < δ we can make ‖s(0, t0, x0)‖ > ǫ by taking t0 < − √ ǫ/δ − 1. Example (Stable, non asymptotically stable equilibrium) For x(t) ∈ R2 consider the linear, time invariant system ẋ(t) = [ 0 −ω ω 0 ] x(t) with x(0) = x0 = [ x01 x02 ] . (6.4) Since the system is linear time invariant we can take t0 = 0 without loss of generality. Exercise 6.2 Show that the system has a unique equilibrium x̂ = 0. Show further that Φ(t, 0) = [ cos(ωt) − sin(ωt) sin(ωt) cos(ωt) ] Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 91 x(t). The most informative way of doing this is to generate a parametric plot of x1(t) against x2(t) parametrized by t. This so-called phase plane plot for this system is shown in Figure 6.3. Before restricting our attention to linear system we point out two more general facts about the stability concepts introduced in Definition 6.2. The first is an intimate relation between stability and continuity. To expose this link we need to think of the function mapping initial conditions to state trajectories from the initial time t0 onwards. Since state trajectories are continuous functions of time, for each t0 ∈ R one can think of this function as a map between the state space Rn and the space of continuous functions C([t0,∞),Rn) s(·, t0,⊙) : Rn −→ C([t0,∞),Rn) x0 7−→ {s(·, t0, x0) : [t0,∞)→ Rn}. The strange notation is meant to alert the reader to the fact that we consider s(·, t0,⊙) for fixed t0 as a function mapping a vector (denoted by the placeholder ⊙) to a function of time (denoted by the placeholder ·, left over after x0 is substituted for ⊙). Recall that for the stability definitions we have equipped Rn with a norm ‖ · ‖. We now equip C([t0,∞),Rn) with the corresponding infinity norm ‖s(·, t0, x0)‖t0,∞ = sup t≥t0 ‖s(t, t0, x0)‖, (6.6) where we include t0 in the notation to make the dependence on initial time explicit. Notice that the first norm in Equation (6.6) is a norm on the infinite dimensional function space (i.e., s(·, t0, x0) ∈ C([t0,∞),Rn) is though of as a function of time), whereas the second norm is a norm on the finite dimensional state space (i.e., s(t, t0, x0) ∈ Rn is the value of this function for the specific time t ∈ [t0,∞)). Fact 6.3 An equilibrium, x̂, of system (6.2) is stable if and only if for all t0 ∈ R the function s(·, t0,⊙) mapping the normed space (Rn, ‖ · ‖) into the normed space (C([t0,∞),Rn), ‖ · ‖t0,∞) is continuous at x̂. Proof: The statement is effectively a tautology. Fix t0 ∈ R and recall that, according to Defini- tion 3.6, s(·, t0,⊙) is continuous at x̂ if and only for all ǫ > 0 there exists δ > 0 such that ‖x0 − x̂‖ < δ ⇒ ‖s(·, t0, x0)− s(·, t0, x̂)‖t0,∞ < ǫ. By Equation (6.6) this is equivalent to ‖x0 − x̂‖ < δ ⇒ ‖s(t, t0, x0)− s(t, t0, x̂)‖ < ǫ ∀t ≥ t0, which, recalling that s(t, t0, x̂) = x̂ for all t ≥ t0 is in turn equivallent to ‖x0 − x̂‖ < δ ⇒ ‖s(t, t0, x0)− x̂‖ < ǫ ∀t ≥ t0, which is precisely the definition of stability. A similar relation between uniform stability and uniform continuity (where the δ above is indepen- dent of t0) can also be derived in the same way. The second general fact relates to the possible rate of convergence. The strongest notion of stability in Definition 6.2, namely exponential stability, requires that solutions converge to the equilibrium exponentially (i.e. rather quickly) in time. Could they converge even faster? Could we, for example, introduce another meaningful stability definition that requires solutions to coverge with a rate of e−αt2 for some α > 0? And if not, can we at least increase α in the exponential convergence? The following fact reveals that Lipschitz continuity imposes a fundamental limit on how fast convergence can be. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 92 Fact 6.4 Let x̂ be an equilibrium of system (6.2) and assume that there exists k > 0 such that for all x, x′ ∈ Rn, ‖p(x, t)− p(x′, t)‖ ≤ k‖x− x′‖. Then for all t0 ∈ R and all t ≥ t0 ‖x0 − x̂‖e−k(t−t0) ≤ ‖s(t, t0, x0)− x̂‖ ≤ ‖x0 − x̂‖ek(t−t0) Proof: If x0 = x̂ the claim is trivially true, we therefore restrict attention to the case x0 6= x̂. Note that in this case we must have s(t, t0, x0) 6= x̂ for all t; if s(t, t0, x0) = x̂ for some t then s(t, t0, x0) must be the equilibrium solution and s(t, t0, x0) = x̂ for all t which, setting t = t0 contradicts the fact that x0 6= x̂. Recall that for simplicity we are using the Euclidean norm. Hence ‖s(t, t0, x0)− x̂‖2 = (s(t, t0, x0)− x̂)T (s(t, t0, x0)− x̂) and ∣∣∣∣ d dt ‖s(t, t0, x0)− x̂‖2 ∣∣∣∣ = ∣∣∣∣ d dt s(t, t0, x0) T (s(t, t0, x0)− x̂) + (s(t, t0, x0)− x̂)T d dt s(t, t0, x0) ∣∣∣∣ = ∣∣p(s(t, t0, x0), t)T (s(t, t0, x0)− x̂) + (s(t, t0, x0)− x̂)T p(s(t, t0, x0), t) ∣∣ ≤ ∣∣p(s(t, t0, x0), t)T (s(t, t0, x0)− x̂) ∣∣ + ∣∣(s(t, t0, x0)− x̂)T p(s(t, t0, x0), t) ∣∣ ≤ ‖p(s(t, t0, x0), t)T ‖ · ‖s(t, t0, x0)− x̂‖+ ‖(s(t, t0, x0)− x̂)T ‖ · ‖p(s(t, t0, x0), t)‖ = 2‖s(t, t0, x0)− x̂‖ · ‖p(s(t, t0, x0), t)‖ = 2‖s(t, t0, x0)− x̂‖ · ‖p(s(t, t0, x0), t)− p(x̂, t)‖ ≤ 2k‖s(t, t0, x0)− x̂‖ · ‖s(t, t0, x0)− x̂‖. On the other hand, ∣∣∣∣ d dt ‖s(t, t0, x0)− x̂‖2 ∣∣∣∣ = ∣∣∣∣2‖s(t, t0, x0)− x̂‖ d dt ‖s(t, t0, x0)− x̂‖ ∣∣∣∣ . Since s(t, t0, x0) 6= x̂, combining the two equations we must have ∣∣∣∣ d dt ‖s(t, t0, x0)− x̂‖ ∣∣∣∣ ≤ k‖s(t, t0, x0)− x̂‖ or in other words −k‖s(t, t0, x0)− x̂‖ ≤ d dt ‖s(t, t0, x0)− x̂‖ ≤ k‖s(t, t0, x0)− x̂‖. Applying the Gronwall Lemma (Theorem 3.8) to the right inequality leads to ‖s(t, t0, x0)− x̂‖ ≤ ‖x0 − x̂‖ek(t−t0). From the right inequality (adapting the steps of the proof of the Gronwall Lemma) we have d dt ( ‖s(t, t0, x0)− x̂‖ek(t−t0) ) = d dt (‖s(t, t0, x0)− x̂‖) ek(t−t0) + ‖s(t, t0, x0)− x̂‖ d dt ek(t−t0) ≥ −k‖s(t, t0, x0)− x̂‖ek(t−t0) + ‖s(t, t0, x0)− x̂‖kek(t−t0) = 0. Hence for all t ≥ t0, ‖s(t, t0, x0)− x̂‖ek(t−t0) ≥ ‖s(t0, t0, x0)− x̂‖ek(t0−t0) = ‖x0 − x̂‖, which leads to ‖s(t, t0, x0)− x̂‖ ≥ ‖x0 − x̂‖e−k(t−t0). In summary, convergence to an equilibrium can be at most exponential. The fact also shows that if an equilibrium is unstable divergence cannot be any faster than exponential. Even though a fixed Lipshitz constant is assumed to simplify the proof it is easy to see that the claim still holds if the Lipschitz constant is time varying but bounded from above and below; one simply needs to replace k by its lower bound in the left inequality and its upper bound in the right inequality. The lower bound on the Lipschitz constant also provides a bound for the rate of exponential convergence. Lecture Notes on Linear System Theory, c© J. Lygeros & F. A. Ramponi, 2015 93 6.2 Linear time varying systems We note that Definition 6.2 is very general and works also for nonlinear systems. Since for linear systems we know something more about the solution of the system it turns out that the conditions of the definition are somewhat redundant in this case. Consider now the linear time varying system ẋ(t) = A(t)x(t) (6.7) and let s(t, t0, x0) denote the solution at time t starting at x0 at time t0. Since s(t, t0, x0) = Φ(t, t0)x0, the solution is linear with respect to the initial state and all the stability definitions reduce to checking properties of the state transition matrix Φ(t, t0). First note that for all t0 ∈ R+ if x0 = 0 then s(t, t0, 0) = Φ(t, t0)x0 = 0 ∈ Rn ∀t ∈ R+ is the solution of (6.7). Another way to think of this observation is that if x(t0) = 0 for some t0 ∈ R+, then ẋ(t0) = A(t)x(t0) = 0 therefore the solution of the differential equation does not move from 0. Either way, the solution of (6.7) that passes through the state x(t0) = 0 at some time t0 ∈ R+ will be identically equal to zero for all times, and x̂ = 0 is an equilibrium of (6.7). Exercise 6.5 Can there be other x0 6= 0 such that s(t, t0, x0) = x0 for all t ∈ R+? Theorem 6.1 Let ‖Φ(t, 0)‖ denote the norm of the matrix Φ(t, 0) ∈ Rn×n induced by the Euclidean norm in Rn. The equilibrium x̂ = 0 of (6.7) is: 1. Stable if and only if for all t0 ∈ R, there exists K > 0 such that ‖Φ(t, 0)‖ ≤ K for all t ≥ 0. 2. Locally asymptotically stable if and only if limt→∞ ‖Φ(t, 0)‖ = 0. Proof: Part 1: We first show that if there exists K > 0 such that ‖Φ(t, 0)‖ ≤ K for all t ≥ 0 then the equilibrium x̂ = 0 is stable; without loss of generality, we can take K > 1. Fix t0 ∈ R and, for simplicity, distinguish two cases: 1. t0 < 0. In this case let M(t0) = supτ∈[t0,0] ‖Φ(τ, t0)‖. 2. t0 ≥ 0. In this case let M(t0) = supτ∈[0,t0] ‖Φ(τ, t0)‖. In the first case, if t ∈ [t0, 0] note that ‖s(t, t0, x0)‖ ≤ ‖Φ(t, t0)‖ · ‖x0‖ ≤ sup τ∈[t0,0] ‖Φ(τ, t0)‖ · ‖x0‖ =M(t0)‖x0‖. If t > 0, ‖s(t, t0, x0)‖ = ‖Φ(t, t0)x0‖ = ‖Φ(t, 0)Φ(0, t0)x0‖ ≤ ‖Φ(t, 0)‖ · ‖Φ(0, t0)‖ · ‖x0‖ ≤ ‖Φ(t, 0)‖ sup τ∈[t0,0] ‖Φ(τ, t0)‖ · ‖x0‖ = KM(t0)‖x0‖.

Documents

questions

Lecture Notes on linear System Theory, Lecture notes of Informatics Engineering

Related documents

Partial preview of the text