Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Linear Programming: Separation Oracles and the Dynamic Simplex Method, Study notes of Linear Programming

Glasgow School of Art (GSA)Linear Programming

The concept of separation oracles and their role in the dynamic simplex method for solving linear programming problems. It explains how to establish whether a given vector is in the polyhedron defined by a linear programming problem and how to prove that a polyhedron is the same as the sum of a problem and its feasible region. The document also touches upon the concept of TDI (totally dual integrality) and its significance in ensuring the integral feasibility of a polyhedron.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

merielynd 🇬🇧

4.7

(10)

218 documents

1 / 80

Partial preview of the text

Download Linear Programming: Separation Oracles and the Dynamic Simplex Method and more Study notes Linear Programming in PDF only on Docsity! NOTES ON COMBINATORIAL OPTIMIZATION GEIR DAHL∗ and CARLO MANNINO† October 1, 2012 ∗ Department of Mathematics and Department of Informatics, CMA, University of Oslo, Norway. geird@math.uio.no † University of Rome, La Sapienza, Department of Computer Science, Rome, Italy, and visiting scientist at CMA. mannino@dis.uniroma1.it 1 List of Figures 1.1 The feasible points (solid circles) of Example 1.1 2 1.2 An example of Hamilton tour 3 1.3 An example of maximum weight forest 4 1.4 A formulation for the set of feasible solutions of Example 1.1 6 1.5 The natural formulation for the set of solutions of Example 1.1 7 1.6 Comparisons among different formulations of Example 1.1 10 1.7 Best against natural formulation of Example 1.1 10 3.1 The greedy solution of Example 3.2 29 3.2 A Eulerian graph and a corresponding Eulerian tour 30 3.3 Step 1 of Christofides Algorithm: build a Minimum Spanning Tree 31 3.4 Step 2, 3 and 4 of Christofides Algorithm: the final Hamilton tour is shown in solid lines 32 3.5 A 2-exchange move 34 3.6 Two distinct tours in the Sarvanov and Doroshko neighborhood 35 3.7 A tour and its associated matching problem 36 4.1 A (partial) enumeration tree 40 4.2 A branching tree 40 4.3 The (partial) branch-and-bound tree of Example 4.1 44 4.4 A vehicle routing for three vehicles 57 4.5 From sequencing to TSP 58 4.6 A 1-tree: the solid edges form a spanning tree F in G[V \ {1}] 60 4.7 Shrinking nodes u and v (zero-weight edges are omitted) 62 4.8 A comb 64 4.9 Shrinking edge uv (st) maps an Hamilton tour of G into a graphical tour of G|uv (G|st), 68 i ii iii n1 n2 n3 n4 n5 n6 3 1 4 2 HHHH 1 HHHH 1 HHHH 4 2 3 2 n1 n2 n3 n4 n5 n6 3 1 4 2 HHHH 1 HHHH 1 Figure 1.2: An example of Hamilton tour The area of polyhedral combinatorics attempts to solve (1.2) (and thus (CO)) by means of linear programming. This is done by exploiting the properties of the convex hull conv(S) of S, which is a polytope1. If we denote by ext(P ) the set of vertices of a polyhedron P , we have S = ext(conv(S)). Consider now the following linear programming problem: max{wTx : x ∈ conv(S)} (1.3) If S is empty, so it is conv(S). Otherwise conv(S) 6= ∅ and there exists an optimal solution x∗ ∈ ext(conv(S)) = S. So, in principle (CO) can be solved by applying the simplex method to (1.3), if we are given an external representation of conv(S) in terms of a system of linear inequalities. Unfortunately, for the great majority of combinatorial optimizations, such a representation is not at hand, not even implicitly. There exist, however, some remarkable exceptions. One such exception is the problem of finding a maximum weight forest in a given connected, undirected maximum weight forest problem graph G = (V, E) with weight function w; this example is from [18]. Recall that a forest is an acyclic undirected graph (whereas a tree is a connected forest). The weight of a forest is simply the sum of weights of its edges, which implies that when looking for maximum weight forests we are not interested in including isolated nodes2. So, it is natural to identify a forest H = (V ′, F ) of G = (V, E) with the set F of its edges. Note also that if all weights are nonnegative the maximum weight forest is equivalent to the spanning tree problem (show this!). With negative weights present, an optimal solution may be a forest and not a 1. Recall that a polytope P is the convex hull of a finite set of points in IRn, and, by the main representation theorem for polyhedra, a set is a polytope if and only if it is a bounded polyhedron. 2. A node is isolated in a graph H if no arcs of H are incident to it. 3 n1 n2 n3 n4 n5 n6 3 1 HHHH 4 4 -2 HHHH -1 HHHH -1 3 -2 2 n1 n2 n3 n4 n5 n6 3 1 HHHH 4 4 Figure 1.3: An example of maximum weight forest tree. Let G = (V, E) be a graph and, for each S ⊆ V , denote by E(S) the set of edges with both ends in S. It is easy to show that G is a forest if and only if, for each S ⊆ V , we have |E(S)| ≤ |S| − 1. This translates immediately into the following characterization of the incidence vectors of forests: Lemma 1.1. Let G = (V, E) be a graph. Then x ∈ {0, 1}E is the incidence vector of (the edges of) a forest if and only if x(E[S]) ≤ |S| − 1 for all S ⊆ V. (1.4) Inequalities of type (1.4) are called subtour elimination constraints. In Chapter 1 of [21] the spanning tree problem is solved in polynomial time by the greedy algorithm. The idea is to iteratively extend a forest by adding a minimum weight edge which does not introduce any cycles. This algorithm also solves the maximum weight forest problem, provided that we terminate whenever the next edge has nonpositive weight. First we define the polytope of interest. We let the forest polytope F (G) be the convex hull of the incidence vectors of all forests in G. Jack Edmonds showed in [7] the following theorem which gives a complete linear description of F (G). Theorem 1.2. F (G) ⊂ IRE is the solution set of the following linear system (i) xe ≥ 0 for all e ∈ E; (ii) x(E[S]) ≤ |S| − 1 for all S ⊆ V . (1.5) Proof. We follow the presentation of [18]. The idea of this proof, due to Ed- monds, is to show that every vertex of the bounded polyhedron defined by (1.5) is the incidence vector of a forest of G. This in turn is done by using the greedy algorithm on the LP dual of the problem of maximizing cT x subject to the linear system (1.5). Also, via complementary slackness, an optimal primal solution is constructed. 4 Let Q denote the polytope being the solution set of (1.5), and we shall prove that Q = F (G). Note that the integral vectors in Q are precisely the incidence vectors of forests by Lemma 1.1. By convexity this implies that F (G) ⊆ Q. To prove the reverse inclusion, let x̄ be a vertex of Q, i.e., this point is the unique optimal solution an LP problem max {cT x : x ∈ Q} for suitable c ∈ IRn. The LP dual of this problem is min ∑ S⊆V yS(|S| − 1) subject to (i) ∑ S:e∈E[S] yS ≥ ce; for all e ∈ E; (ii) yS ≥ 0 for all S ⊆ V . (1.6) Consider the greedy algorithm applied to the primal problem, and assume that the edges found are F = {e1, . . . , es} in that order. Thus, in the i’th iteration the edge ei is added and it joins two components of the current solution and forms the new component Vi. (For instance, when i = 1 we have n components and ei = [u, v] joins the components {u} and {v}, so V1 = {u, v}.) The dual solution y is defined next. We let yS = 0 for all sets S not among the sets Vi, i = 1, . . . , s. The values for the sets Vi are determined in the reverse order as follows. Let yVr = c(er). Consider the primal greedy solution x′ found above. The complementary slackness condition for edge e says, as x′ er > 0, that ∑ S:er∈E[S] yS = c(er). With our definition of yVr this equality already holds, and we are forced to define yS = 0 for all other sets S that contain both endnodes of er. To define the remaining components of y, let, for each j ∈ {1, . . . , r − 1}, I(j) = {i : j + 1 ≤ i ≤ r and both end nodes of ej are in Vi} and define yVj = c(ej) − ∑ i∈I(j) yVi for j = r − 1, r − 2, . . . , 1. One can now check that y is dual feasible, x′ is primal feasible and that the com- plementary slackness condition holds. Thus it follows from LP theory that both these solutions are optimal in the respective problems. But x̄ was also optimal in the primal, and due to the uniqueness, we must have x̄ = x′, which proves that every vertex of Q is integral and we have Q = F (G) as desired. In most cases, an external representation of conv(S) as the solution set of a system Ax ≤ b is not available. However, we will show in the next section that it may be very useful to have at hand some suitable relaxations of conv(S), denoted as formulations (of S). 5 elimination constraints: x(E[S]) ≤ |S| − 1 for all S ⊆ V , S 6= ∅, S 6= V . These constraints were introduced in the 1954 paper by Dantzig, Fulkerson and Johnson [5]. Note that the number of 2-connectivity inequalities, or subtour elim- ination constraints, grows exponentially as a function of the number of nodes. Relaxations. In what follows we will often deal with mathematical optimization problems of the form max{f(x) : x ∈ S ⊆ IRn}, and with their optimal values. Conventionally, we will denote such a problem by a capital roman letter between brackets, e.g. (Q); also, we denote its value by v(Q) = max{f(x) : x ∈ S ⊆ IRn} where we permit the values v(Q) = −∞ (when the problem is infeasible) and v(Q) = ∞ (when the problem is unbounded). Now, suppose we want to solve the combinatorial optimization max{wT x : x ∈ S} (where S ⊆ {0, 1}n}) denoted by (Q), and assume we have at hand a feasible solution x̄ ∈ S to (Q). How good is x̄? In other words, how does the value wT x̄ of x̄ compare with the optimal value v(Q) to (Q)? We will see that formulations can provide such an answer. First we need to introduce the concept of a relaxation of an optimization problem. Definition 1.2. Let S ⊆ IRn and let (Q) be max{w(x) : x ∈ S}. The problem (R) max{g(x) : x ∈ T} is a relaxation of (Q) iff T ⊇ S and g(x) ≥ w(x) for all x ∈ S. The following theorem is a straightforward consequence of the above definition: Theorem 1.3. Let (Q) be a maximization problem (as above) and let (R) be a relaxation of (Q). Then v(R) ≥ v(Q).relaxation quality In other words, the optimal value of (R) provides an upper bound (UB) on the optimal value of (Q). Let’s go back to our original question. In many cases, it may be very difficult to find an optimal solution x∗ of an optimization problem Q, whereas it can be easy to obtain a feasible solution x̄ ∈ S. This is often the case when (Q) is a combinatorial optimization. Clearly, w(x̄) ≤ w(x∗) = v(Q), and w(x̄) is a lower bound (LB) on the optimal value to (Q). Suppose now that we have at hand a relaxation (R) {max g(x) : x ∈ T} of (Q) and assume that an optimal solution y∗ to R can be computed efficiently. We have UB = v(R) = g(y∗) ≥ w(x∗) ≥ w(x̄) = LB. 8 The optimal value of (Q) belongs to the interval between the lower and the upper bound (we are assuming that both can be computed efficiently). Its size UB−LB is called the gap. The smaller the gap is, the better it is. If gap = 0 then UB = LB, which in turn implies UB = v(R) = g(y∗) = w(x∗) = w(x̄) = LB, which proves that x̄ is an optimal solution to (Q). So, in general, we would like the upper bound to be as small as possible; in contrast, the lower bound should be as large as possible. In other words, we want to have at hand • strong relaxations of (Q), providing small upper bounds on v(Q) and • effective heuristic methods, capable to find good quality solutions to (Q), hopefully optimal. The rest of this chapter is devoted to the first goal, whereas heuristic methods will be discussed in a subsequent chapter. Now, let (Q) be max{wTx : x ∈ S} where S ⊆ {0, 1}n be a combinatorial optimization problem, and let P ⊆ IRn be a formulation for S. By definition, S ⊆ P . It follows from the above discussion that the following problem (R) max{wTx : x ∈ P} is a relaxation of (Q). Since problem (R) amounts to maximizing a linear func- tion over a polyhedron, (R) is a linear program (LP), and we there are a number of efficient algorithms to solve LPs. The simplex method is one of these, even though its theoretical complexity is still unknown. This means that the simplex method works very well in practice, but in principle there exist highly complicated instances requiring an enormous amount of time to be solved. Interestingly, the- oretically more efficient methods as the interior point algorithms do not perform significantly better in practice. So, problem (R) along with the simplex method can be used to compute upper bounds on v(Q). comparing formulations As already mentioned, we may have different formulations for the same set S. How can we distinguish among them? We need some criteria to compare between formulations and to measure their quality. So, let (Q) be max{wT x : x ∈ S ⊆ {0, 1}n} be a combinatorial optimization, and let P1 and P2 be two distinct for- mulations for S. Since we want to use P1 and P2 to compute upper bounds on v(Q), one candidate quality criterium is the following: P1 is better than P2 if and only if max{wT x : x ∈ P1} ≤ max{wT x : x ∈ P2}. So, if we consider the two for- mulations in Figure 1.1 for the problem of Example 1.1, the natural formulation P1 has to be preferred to the other when the objective function is the original one, to maximize 4x1 + 5x2. However, if the objective becomes max x2, then P2 has to be preferred to P1. So, this indicator is dependent on the objective and thus on the problem. It is 9 6 - (0,0) 1 1 2 2 u u u e max x26 \ \ \ \ \ \ \ \ \ \ \ \\ max 4x1 + 5x2 Z Z Z ZZ A A A A A P1 J J J J J J J JJ (((((((( A A A AA Sw P2 Figure 1.6: Comparisons among different formulations of Example 1.1 6 - (0,0) 1 1 2 2 u u u e Z Z Z ZZ A A A A A @ @ @ @ @ @ Figure 1.7: Best against natural formulation of Example 1.1 preferable to establish a criterium which only depends on the formulations. So, we say that P1 is better than P2 iff P1 ⊆ P2. This implies that max{wTx : x ∈ P1} ≤ max{wT x : x ∈ P2} for any w ∈ IRn and P1 will always return a bound which is not worse than that associated with P2. Actually, from convexity we have the following theorem: Theorem 1.4. Let P1 and P2 be polytopes in IRn. Then P1 ⊆ P2 if and only if max{wT x : x ∈ P1} ≤ max{wTx : x ∈ P2} for all w ∈ IRn. Now, it is not difficult to see that conv(S) is contained in every formulation for S, and we say that conv(S) is the best formulation for S. Indeed, we have that conv(S) returns the best possible upper bound, since for all w ∈ IRn, we have: max{wT x : x ∈ conv(S)} = max{wTx : x ∈ S}. 10 To this end, let z ∈ {0, 1}V be the incidence vector of a set S ⊆ V . Observe that w ∈ {0, 1}E is the incidence vector of E[S] if and only if wij = 1 precisely when zi = 1 and zj = 1, whereas wij = 0 otherwise. This can be expressed by the following system of linear inequalities (plus the 0-1 constraints on the variables): we ≤ zi, we ≤ zj for e = (i, j) ∈ E SUB(V, E) we ≥ zi + zj − 1 for e = (i, j) ∈ E (1.12) w ∈ {0, 1}E, z ∈ {0, 1}V Denote by SUB(V, E) the set of feasible solutions to (1.12). So (z̄, w̄) identifies a subtour elimination constraint associated to S̄ and is violated by x̄ if and only if - z̄ is the incidence vector of S̄ - (z̄, w̄) ∈ SUB(V, E) and - ∑ e∈E[S̄] x̄e > |S̄| − 1. By observing that ∑ e∈E[S̄] x̄e = ∑ e∈E x̄ew̄e and |S̄| = ∑ i∈V z̄i, the last condition can be rewritten as follows: ∑ e∈E x̄ew̄e > ∑ i∈V z̄i − 1 (1.13) We are finally able to write a separation oracle for the subtour elimination con- straints. Define the following 0-1 linear optimization problem: max ∑ e∈E x̄ewe − ∑ i∈V zi s.t. (1.14) (z, v) ∈ SUB(V, E) Denote by (z∗, w∗) an optimal solution to (1.14), and let v∗ be its value. Then it is easy to see that • if v(z∗, w∗) > −1, the (z∗, w∗) identifies a subtour elimination constraint violated by x̄ which is returned by the oracle. • if v(z∗, w∗) ≤ −1 then no solutions in SUB(V, E) satisfy (1.13) and the oracle returns no violated constraint. 13 The separation oracle just described requires the solution of the 0-1 linear pro- gram (1.14), which in turn may be difficult. However, the next theorem states that we can simply solve the linear relaxation of (1.14), that is, the linear pro- gram obtained from (1.14) by replacing the binary stipulation on the variables with the following linear (box) constraints: 0 ≤ we ≤ 1 for e ∈ E, 0 ≤ zi ≤ 1 for i ∈ V. (1.15) The proof of this theorem can be found in [23]. Theorem 1.5. The linear relaxation of ( 1.14) always has an integer optimal solution solving ( 1.14). The dynamic simplex method Separation oracles and the simplex method are the building blocks of a dynamic simplex algorithm for the LP program (Q) defined as max{wT x : x ∈ P}, wheredynamic sim- plex method P = {x ∈ IRn : Ax ≤ b} is a polyhedron, with A ∈ IRm×n and b ∈ IRm. In particular the dynamic simplex method solves a sequence of LP programs (Qi) defined as max{wTx : x ∈ P i}, for i = 1, . . . , t, with P 1 ⊃ P 2 ⊃ · · · ⊃ P t ⊇ P. This implies that for i = 1, . . . , t, (Qi) is a relaxation of (Qi+1), and we have v(Qi) ≥ v(Qi+1) ≥ v(Q). For i = 1, . . . , t, we define P i = {x ∈ IRn : Dix ≤ di}, where [Di di] is a matrix obtained by considering a suitable subset of rows of [A b]. Also, [Di+1 di+1] is obtained from [Di di] by including an additional row from [A b]. To simplify the discussion, we suppose that P 1, . . . , P t, P are bounded, which implies that the associated linear programs are either empty or have an optimal solution. The dynamic simplex method starts by selecting a suitable subset of rows [A b], denoted by [D1 d1]. Then, at each iteration i, the linear program max{wTx : x ∈ P i} denoted by (Qi) is solved by the simplex method. If P i is empty, then P ⊆ P i is also empty and the dynamic simplex method terminates. Otherwise, let x̄i be an optimal solution to (Q)i. At this stage, we invoke a separation oracle to establish whether x̄i ∈ P or not. If x̄i ∈ P , then wT x̄i ≤ v(Q). Also, since (Qi) is a relaxation of (Q), we have wT x̄i = v(Qi) ≥ v(Q): so, we can conclude that wT x̄i = v(Q) and x̄i is optimal to (Q). 14 Finally, suppose x̄i /∈ P . Then the separation oracle returns a constraint of the description of P violated by x̄i. We obtain [Di+1 di+1] by adding this new row to [Di di] and iterate. The idea behind the dynamic simplex method is that the algorithm terminates long before we have added all the defining constraints of P . Next, we show an application of the dynamic simplex method to solve the Maximum Weight Forest. This is only to illustrate the methodology with our example, as there exists a simpler and purely combinatorial algorithm to solve such problem. Example 1.4. Let P be the forest polytope associated to the graph in Figure 1. Let P 1 be obtained from P by dropping all subtour elimination constraints but those associated with the sets of two elements. Then Problem (Q1) is: max 4x12 − x13 + 3x14 + 4x24 + 3x25 − x34 + x36 + 2x45 − x46 − 2x56 s.t. 0 ≤ xe ≤ 1 for all e ∈ E The optimal solution is simply x1 12 = x1 14 = x1 24 = x1 25 = x1 36 = x1 45 = 1, and all other variables are 0, whose value is 17. The separation oracle applied to x1 return x1 /∈ P plus the violated subtour inequality x12 +x14 +x24 ≤ 2, associated to S = {1, 2, 4}. This is added to the linear description of P 1 to obtain P 2 and the simplex algorithm resolves the following problem (Q2): max 4x12 − x13 + 3x14 + 4x24 + 3x25 − x34 + x36 + 2x45 − x46 − 2x56 s.t. x12 + x14 + x24 ≤ 2 0 ≤ xe ≤ 1 for all e ∈ E An optimal solution to (Q2) is x2 12 = x2 24 = x2 25 = x2 36 = x2 45 = 1 and all other variables are 0, with value 14. The next invocation to the oracle produces the violated inequality x24 + x25 + x45 ≤ 2, associated to S = {2, 4, 5}. We include it in P 3 and solve Q3 obtaining x3 12 = x3 14 = x3 25 = x3 36 = x3 45 = 1, with value 13. 15 Let p be an integral vector in P , so p = q + c for some q ∈ Q and c ∈ C. Then c = ∑ j≤m µjgj = c′ + c′′ where we define c′ = ∑ j≤m(µj − ⌊µj⌋)gj and c′′ = ∑ j≤m⌊µj⌋gj ∈ C. Then c′′ is integral and c′ ∈ M (as 0 ≤ µj − ⌊µj⌋ ≤ 1). This gives p = (q+c′)+c′′ ∈ (Q+M)I+C because q+c′ ∈ Q+M and q+c′ = p−c′′ which is integral. We have therefore shown that PI ⊆ (Q+M)I +C. Furthermore, since Q+M ⊆ P and C = CI we obtain (Q+M)I +C ⊆ PI +CI ⊆ (P +C)I = PI . This proves (2.3). The proof can now be completed by applying the decomposition theorem for polyhedra. In (2.3) Q+M is a bounded polyhedron (i.e., a polytope) and therefore (Q + M)I is polytope. Thus, by (2.3), PI is algebraic sum of a polytope and a convex cone which proves that it is a polyhedron. Furthermore, if PI is nonempty, we also get char.cone(PI) = char.cone(P ) (from the uniqueness of polyhedral decomposition). A polyhedron is called integral if P = PI , i.e. it equals the convex hull of its inte- gral vectors. (For convenience, we also say that the empty polyhedron is integral.) Integral polyhedra are interesting in connection with integer linear programming. In fact, we have in general that max{cT x : x ∈ P, x is integral} = max{cT x : x ∈ PI} ≤ max{cT x : x ∈ P}. (2.4) If P is integral the inequality in (2.4) can be replaced by equality, and the values of the ILP and the corresponding LP coincide. In fact, among the optimal solutions of the LP problem max {cT x : x ∈ P} there is an integral one. Some equivalent descriptions of integral polyhedra are listed next; the proof is left for an exercise. Proposition 2.2. The following conditions are all equivalent whenever P ⊆ IRn is a nonempty polyhedron. (i) P is integral. (ii) Each minimal face of P contains an integral vector. (iii) max {cT x : x ∈ P} is attained for an integral vector for each c ∈ IRn for which the maximum is finite. Furthermore, if P is pointed (e.g., if P ⊆ IRn +), then P is integral if and only if each vertex is integral. 18 There are some further equivalent conditions describing the integrality of a poly- hedron. One such interesting condition is that the optimal value is integral for any LP over the polyhedron with integral objective function. This result leads to the concept of total dual integrality which gives a method for proving that a polyhedron is integral. Proposition 2.3. Let P be a rational polyhedron in IRn. Then P is integral if and only if max{cTx : x ∈ P} is an integer for each c ∈ Zn such that v(LP ) is finite. Proof. In order to simplify a bit, we only prove this result for the situation with P pointed. Assume that P is integral, and consider max{cT x : x ∈ P} with c ∈ Zn and assume that the optimal value is finite. Then, from Proposition 2.2 this LP has an optimal solution x∗ which is integral. But then clearly the optimal value cT x∗ is an integer, which proves the “only if” part. We prove the converse implication by contradiction. So assume that P is not integral. As P is pointed, and rational, this means that there is a vertex x̄ of P with a fractional component x̄j . (Add details .....). Therefore, there is a c̄ ∈ Zn such that x̄ is the unique optimal solution of max {c̄T x : x ∈ P}. One can then show that there is an ǫ > 0 such that for each c′ ∈ B(c̄, ǫ) x̄ is the unique optimal solution of the problem max {(c′)T x : x ∈ P}. For a suitably large positive integer s we have that d = c̄+(1/s)ej ∈ B(c̄, ǫ). Note that x̄ is the optimal solution of the LP problem over P with objective function sd = sc̄+ej and therefore its optimal value equals sdT x̄ = sc̄T x̄ + x̄j . But then one of the two optimal values sdT x̄ and sc̄T x̄ must be fractional (as x̄ is fractional), i.e., there is an integer objective function such that the corresponding optimal value is fractional; a contradiction. This completes the proof. We call a linear system Ax ≤ b totally dual integral, or TDI for short, if for all integral c with max{cT x : Ax ≤ b} finite, the dual LP problem min{yT b : yTA = cT , y ≥ 0} has an integral optimal solution. Note here that this definition concerns a given linear system and not the corresponding polyhedron P = {x ∈ IRn : Ax ≤ b}. In fact, a polyhedron may be represented by both a TDI system and another non-TDI system. The importance of the TDI property is seen from the next result. Corollary 2.4. Assume that the system Ax ≤ b is TDI and that b is integral. Then the associated polyhedron P = {x : Ax ≤ b} is integral. Proof. By definition of TDI the dual problem (D) min{yT b : yTA = cT , y ≥ 0} has an integer optimal solution y∗ for each c ∈ Zn with max{cT x : Ax ≤ b} finite. 19 But as b is integral we get that the optimal value v(D) = (y∗)T b is also integral for such problems. By the LP duality theorem, this shows that max{cT x : x ∈ P} is an integer for each c ∈ Zn such that v(LP ) is finite, and P is integral by Proposition 2.3. Note here that the fact that Ax ≤ b is TDI is not sufficent to guarantee that P = {x : Ax ≤ b} is integral, because with fractional b the optimal value of the dual problem may be fractional (although the optimal solution is integral). It can be shown that if Ax ≤ b is a rational linear system, then there is an integer M such that the scaled system (1/M)Ax ≤ (1/M)b is TDI. As a consequence, we see that each rational polyhedron has a TDI representation. An example of a TDI system is x(δ+(U)) ≥ 1 for all U ⊂ V , r ∈ U ; x ≥ 0 (2.5) where r is a given node in a directed graph D = (V, E). The solution set of this system is the dominant of the convex hull of incidence vectors of r-arborescences in D. An r-arborescence is an arc set F such that the subgraph (V, F ) contains a directed rt-path for each node t 6= r and where each such node t has exactly one ingoing arc in F . We remark that these graphs are of interest in the design of certain transportation or communication networks. 2.2 Totally unimodular matrices In the previous section we studied the integrality of a polyhedron in connection with TDI linear systems. The topic of the present section is to study a stronger property, namely for the matrix A alone, which assures the integrality of the associated polyhedron. An m×n matrix A is called totally unimodular, TU for short, if the determinant of each square submatrix is -1, 0 eller 1. Thus, in particular, a TU matrix has entries being -1, 0 or 1. Such matrices arise naturally in connection with graphs, see Section 2.3. A relation between integrality of a polyhedron and total unimodularity is given next. Theorem 2.5. Let A ∈ IRm,n be a TU matrix and b ∈ IRm an integral vector. Then the polyhedron P = {x ∈ IRn : Ax ≤ b} is integral. 20 v. Note that since AG is an incidence matrix (all elements are 0 or 1), it may be totally unimodular. The precise answer is that it is TU exactly when the graph contains no odd cycles. Proposition 2.10. AG is TU if and only if G is bipartite. Proof. Let first G be bipartite with the two color classes I1 and I2. We shall then show that AG is TU using Ghouila-Houri’s TU characterization. Let I be a subset of the rows of AG, and let I ′ 1 = I ∩ I1 and I ′ 2 = I ∩ I2. Let a be the vector obtained by summing the rows of AG associated with I ′ 1 and subtracting the rows associated with I ′ 2. The component ae of a correponding to an edge e = [u, v] must be either 1 (if one of the endnodes is in I ′ 1 and the other is not in I ′ 2), -1 (the converse situation) or 0 (if both or none of the endnodes lie in I). This a is a vector with componenents -1,0 and 1 and therefore AG is TU according to Proposition 2.9. Conversely, assume that G is not bipartite. It follows that G has an odd cycle C. Let B be the square submatrix of AG indiced by the rows and columns corre- sponding to the nodes and edges of C, respectively. With suitable permutations we transform this matrix into the circulant matrix C2,t where t is the length of the cycle. (This is a 0,1-matrix where the i’th row, for i = 1, . . . , t − 1 has two nonzeros (1’s) namely in position i and i + 1, while the two nonzeros (1’s) of row t are in the first and last position). One can show that |det(C2,t)| = 2 and it follows that AG is not TU. The previous result may be combined with Corollary 2.6 to get important com- binatorial minmax theorems. We illustrate this for problems concerning packing and/or covering of nodes and edges of a graph. First, we give a more “symmetric” integrality/LP theorem derived from Corollary 2.6. Corollary 2.11. Let A ∈ IRm,n be a TU matrix and b ∈ IRm and c ∈ IRn two integral vectors. Then each of the dual LP problems in the duality relation max{cT x : Ax ≤ b, x ≥ 0} = min{yT b : yTA ≥ cT , y ≥ 0}. (2.8) has an integral optimal solution provided that the optimal values are finite. Proof. We apply Corollary 2.6 to the matrix [ A −I ] which is TU as A is TU. The primal problem becomes max {cT x : Ax ≤ b, x ≥ 0} as desired. The dual problem is min{yT b : yT A − z = cT , y ≥ 0, z ≥ 0} = min{yT b : yT A ≥ cT , y ≥ 0}. 23 Note here that there is an optimal integral solution (y, z) of the first problem if and only if there is an optimal integral solution y of the second problem. The result then follows from the duality relation. Let G be a bipartite graph, so by the previous proposition, its incidence matrix AG is TU. Consider the LP duality relation (2.8) with A replaced by AG and with c = 1 and b = 1. We then obtain max{1T x : AGx ≤ 1, x ≥ 0} = min{yT1 : yTAG ≥ 1T , y ≥ 0}. (2.9) We know that each of these two problems has an integral optimal solution and we interprete this combinatorially. First, we note that in the system Ax ≤ 1, x ≥ 0 we have one variable xe for each edge e ∈ E, and that x is a solution if and only if x is nonnegative and satisfies x(δ(v)) ≤ 1 for each v ∈ V . Thus an integer solution here must in fact be 0,1-valued, and it represents a matching in G. A matching in a graph is an edge subset such that no node is incident to more than one edge. Therefore, the maximization problem in (2.9) is to find a maximum cardinality matching in the bipratite graph G. This is a classical problem which is polynomially solvable by e.g. a combinatorial algorithm called the Hungarian algorithm. Now, we turn to the minimization problem, and observe that there is only one variable xv for each node v ∈ V . An integer feasible solution of this problem assigns a nonnegative integer to each node in such way that each edge has an endnode with a strictly positive integer. Clearly, we may restrict our attention to such integers that are 0 or 1, and the the minimization problem becomes: find a node cover in G of minimum cardinality. A node cover is a subset V0 of the nodes such that each edge has at least one endnode in V0. Therefore, due to integrality, the relation (2.9) says that the maximum cardinality of a matching in a bipartite graph equals the minimum cardinality of a node cover. This result is known as the König-Egervary theorem and dates to 1931. As a second application of Corollary 2.11 we study the effect of the choice A = AT G, i.e., the transpose of the node-edge incidence matrix of G. Again we use c = 1 and b = 1. We shall assume that G contains no isolated node (otherwise one of the problems studied would become infeasible). We then obtain (when we let the role of x and y be changed) max{1T y : AT Gy ≤ 1, y ≥ 0} = min{xT1 : xT AT G ≥ 1T , x ≥ 0}. (2.10) We recall the interpretations above and see that an integral feasible y in the max- imization problem corresponds to a node packing (also called independent set or stable set). Thus this problem may be viewed as to find a maximum cardinality 24 node packing in G. A node packing is a subset S of the node set such that no pair of nodes are adjacent. As remarked above in connection with the node cover prob- lem, one may restrict the attention to 0-1 valued solutions in the minimization problem of (2.10). The problem therefore reduces to that of finding a minimum cardinality edge cover. An edge cover is a subset of the edge set such that each node is incident to one of the chosen edges. The interpretation of the minmax relation of (2.10) then becomes: the maximum cardinality of a node packing in a bipartite graph equals the minimum cardinality of an edge cover. This result is also due to König (1933) and is often called König’s covering theorem. Note that all of the four combinatorial problems discussed above are polynomially solvable in bipartite graphs. In fact, it follows from the results above that they may be solved by a polynomial algorithm for LP which finds optimal vertex solutions (and such algorithms do exist, e.g., based on the ellipsoid method). There are also purely combinatorial algorithms for each of these problems that are polynomial, see e.g. [14]. We next study problems in directed graphs. Let D = (V, E) be a directed graph and let AD be its node-arc incidence matrix. The basic tool is the following result. Proposition 2.12. AD is TU for each digraph D. Proof. We give an elegant and short proof of this fact due to Veblen and Franklin (1921), see [20]. We prove by induction that each subdeterminant is -1, 0 or 1. Assume that this is true for all submatrices of dimension t, and let N be a t × t submatrix of AD. If N contains a column with all zeros, then det(N) = 0. If a column of N contains exactly one nonzero, this number is either -1 or 1, and when we expand the determinant for this column we get that det(N) ∈ {−1, 0, 1} by the induction hypothesis. Finally, if each column of N contains two nonzeros, the sum of all row vectors is the zero vector, so the row vectors ar linearly dependent and det(N) = 0. Using this fact combined with Corollary 2.11 we obtain that max{cT x : ADx = 0, 0 ≤ x ≤ d, x integral} = min{yTd : zT AD + y ≥ cT , y ≥ 0, y, z integral}. (2.11) We here assume that c and d are integral vectors and allow components of d to be ∞ (in which case the corresponding dual variable is not present). An inter- pretation may be given as follows. Recall that x ∈ IRE is called a circulation if 25 monotonization. This transformation may, at least, be useful in the analysis of the problem. For instance, in the Travelling Salesman Problem, we may consider the class of edge sets being subsets of Hamilton cycles. For an independence system the greedy algorithm may be used to construct a feasible solution: Greedy algorithm Step 1. Order the elements in E in a sequence e1, . . . , em such that w(e1) ≥ . . . ≥ w(em). Set F := ∅. Step 2. While F ∈ F add the next element in the sequence to F . It is easy to see that Kruskal algorithm for spanning trees ([21]) is a greedy algorithm. This algorithm terminates with a feasible solution, and it is simply called the greedy solution. Example 3.1. Consider the following combinatorial optimization problem. Let the ground set be E = {1, 2, 3, 4}, and let w(1) = 4, w(2) = 3, w(3) = 2, w(4) = −8. Let F = {{1, 3}, {1, 3, 4}, {1, 2, 4}, {2, 3}}. After monotonization, the extended solution set becomes F∗ = {∅, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, , {2, 3}, {2, 4}, {3, 4} {1, 3, 4}, {1, 2, 4}}. According to the weight function, the ordered set of elements is {1, 2, 3, 4}. If we apply the greedy algorithm, then at the first iteration we have F = {1}. Next, we have F = {1, 2}. Finally, we have F = {1, 2, 4}. No more elements can be included in F to get a feasible solution and we are done. Remark that w(F ) = −1 and the optimal solution is F ∗ = {1, 3}, with w(F ∗) = 6. Example 3.2. Consider the weighted graph in Figure 3.1. Let us apply the greedy algorithm to the TSP problem introduced in Chapter 1. The sequence produced by the greedy algorithm will be {}, {(3, 5)}, {(3, 5), (2, 4)}, {(3, 5), (2, 4), (4, 5)}, {(3, 5), (2, 4), (4, 5), (1, 2)}, and, finally H = {(3, 5), (2, 4), (4, 5), (1, 2), (1, 3)}. We have w(H) = 26, which can be easily verified to be non-optimal. So, in general the greedy algorithm does not terminate with an optimal solution. However, it can be shown (see [21]) that the greedy solution is optimal (for every possible weight function w) if and only if the independence system is also a so- called matroid. 28 5 34 2 1 3 4 2 1 155 3 7 5 5 5 34 2 1 3 4 2 1 155 3 7 5 5 Figure 3.1: The greedy solution of Example 3.2 The greedy algorithm is probably the most simple and natural constructive heuristic for combinatorial optimization problems, but there are other approaches as well, often more effective. approximation guarantee We illustrate one such approach to the metric TSP problem, due to Christofides [3]. This method has an additional, interesting feature as it belongs to the class of heuristics with approximation guarantee. To introduce this concept, let (Q) be a maximization problem and let v(Q) be its optimal value. Let A be an approximation algorithm for (Q), i.e. A finds a feasible solution x to (Q), and let w(x) denote the value of x. Then A has a α-approximation factor guarantee if and only if αw(x) ≥ v(Q). When (Q) is a minimization problem, the condition becomes w(x) ≤ αv(Q). Let G = (V, E) be an undirected, complete graph and let w : E → IR be a weight function. We also suppose that w satisfies the triangle inequality, that is for all u, v, z ∈ V we have w(u, v) ≤ w(u, z) + w(z, v). In order to describe Christofides approximation algorithm for TSP, we need to introduce the definition of Eulerian tour in a (multi-graph). Definition 3.1. Let G = (V, E) be a connected multigraph (edge repetitions are allowed). A Eulerian tour is a tour in G passing through each edge exactly once. One can show that a graph admits a Eulerian tour if and only if every node has even degree1. In this case, the graph is said to be Eulerian. A Eulerian tour can be easily constructed in a Eulerian graph. 1. Actually, this problem is considered as the mother of all network problems. It was first stated and solved by Euler while attacking the famous Königsberg 7-Bridges Problem 29 1 2 4 5 3 1 5 4 3 2145 Figure 3.2: A Eulerian graph and a corresponding Eulerian tour The following algorithm finds a factor 2-approx Hamilton tour H in G, i.e. if H∗ is an optimal Hamilton tour, then w(H) ≤ 2w(H∗). Christofides Algorithm Step 1. Find a minimum spanning tree T of G. Step 2. Double every edge of T to obtain a Eulerian graph Step 3. Find a Eulerian tour T on this graph Step 4. Return the Hamilton tour that visits the vertices of G in the order of their first appearance in T . Theorem 3.1. Christofides algorithm is a factor 2 approximation algorithm for the metric TSP. Proof. Denote by H∗ an optimal Hamilton tour in G and let H̄ the tour returned by the algorithm. First observe that (i) w(T ) ≤ w(H∗). In fact, every Hamilton tour H contains a spanning tree: take any edge e of H and remove it from H ; the remaining graph is a Hamilton path (a path passing through each node exactly once) and thus it is a spanning tree. (i) follows from the non-negativity of w(e). At Step 2, observe that T contains each edge of T exactly twice, hence w(T ) = 2w(T ). Now, if T does not contain repeated nodes, then T is a Hamilton tour and H̄ = T . It follows that w(H̄) = w(T ) = 2w(T ) ≤ 2w(H∗) (the last inequality holds from 30 optimization problems, whose solution does not even ensure global optimality. Is there any advantages in doing so? Of course, it all depends on our capacity to efficiently solve the optimization problem (3.2), and on the quality of the final solution. Concerning quality, this is typically measured by exploiting well estab- lished benchmark test-beds (there are relevant issues concerning this question, but we will not discuss them here). There are two major lines of attack to reach efficiency. The most common one is based on the definition of suitably ”small” neighborhoods, so that the optimal solution can be found by complete enumera- tion: we call such approach small neighborhood search. The second approach is somehow smarter. Indeed, there are several combinatorial optimization problems which can be solved efficiently, even though the number of solutions is very large. Examples are the minimum spanning tree problem, the matching problem, the minimum path problem, etc. So, the game here is to define the neighborhood function N in such a way that the corresponding optimization problems (3.2) falls into this category. This is subject of the exponential neighborhood search theory. Small Neighborhood Search. There is a huge (and perhaps boring?) literature on the definition of small but still effective neighborhoods for myriad versions (of versions) of combinatorial optimization problems. However, a few of these neighborhood definitions are quite interesting and very effective in practice, and provide the basis for various extensions. One such example is the 2-exchange neighborhood for the TSP introduced by Lin and Kerninghan [12]. We discuss this neighborhood in detail hereafter. move Typically, small neighborhoods are defined through moves. A move is an algo- rithm, which receives a feasible solution S ∈ F to a combinatorial optimization problem plus some additional input parameters, and returns a new solution T ob- tained from S by applying some small change to S. Moves are normally such that the distance d(S, T ) between S and T is small, where d(S, T ) = |S∆T |. In other words, S and T are not too different from each other. The neighborhood of S is then defined as the set of solutions T which may be obtained by invoking the move on S with different additional input parameters. The neighborhood proposed by Lin and Kerninghan for the TSP belongs to the class of distance-k-neighborhood, which is defined for each S ∈ F as Nk(S) = {T ∈ F : d(S, T ) ≤ k}. In particular, they implemented a distance-4-neighborhood which corresponds to applying a particular move called 2− exchange (the derived local search algorithm is called 2 − opt). A 2-exchange move is showed in Figure 3.5. 2-exchange heuristic The 2-exchange move consists in selecting two non-adjacent edges from the given Hamilton tour S and replace them with two new edges so as to obtain a distinct 33 5 34 2 1 3 2 1 15 5 5 34 2 1 21 57 4 5 34 2 1 2 1 5 Figure 3.5: A 2-exchange move Hamilton tour T . It is easy to see that, once we have chosen the two exiting edges, the entering pair is fixed. In the example above, if we remove edge (4, 5) and edge (1, 3) then the entering edges are (1, 5) and (3, 4) (remark that the initial tour is valued 26, whereas the final one is valued 18, which means that the move is improving). The final neighborhood is obtained by choosing every possible pair of non-adjacent edges and applying to each pair the 2-exchange move. Observe now that any Hamiltonian tour S contains |V | edges. There are roughly O(|V |2) ways to select a pair of non-adjacent edges, which in turn implies that |N(S)| is also O(|V |2), for each Hamilton tour S. So, even when V is large, such a neighborhood can be easily constructed explicitly and the optimal solution can be found by enumeration. Exponential Neighborhood Search. Clearly, the larger the neighborhood, the more likely we generate good quality solutions. As an extreme case, if N(S) = F , then the restricted optimization problem (3.2) actually coincides with the original problem (3.1). However, when we deal with large neighborhoods, complete enu- meration becomes impractical and we need to resort to smarter search techniques. More specifically, a careful definition of the neighborhood function can allow us to resort to efficient solution algorithms, even if the number of solutions in the neighborhood grows exponentially with the input size. We discuss an example for the TSP introduced by Sarvanov and Doroshko [19]. Let G = (V, E) be a complete graph, with V = {1, . . . , n} and w : E → IR a weight function. First observe that the set of Hamilton tours of G is in correspon- dence with the set of linear orders on V . So, for example, the tour in Figure 3.1 corresponds to the order (1, 2, 4, 5, 3). Actually, distinct orders may represent the 34 same tour (e.g. (2, 4, 5, 3, 1)), but we can fix our attention to those orders having node 1 in first position. It is easy to see that there is a one-to-one correspondence between the Hamilton tours of the complete graph G and the linear orders of its nodes having node 1 in first position. Positions 2, . . . , n may be occupied by any other node. For a given Hamilton tour S represented as an order, the Sarvanov Sarvanov and Doroshko neighborhood and Doroshko neighborhood N(S) consists in all those orders which can be ob- tained from S by keeping fixed the nodes in odd position, while letting all even nodes change their position in all possible ways. a b c d e f 2 1 3 4 5 6 a d c f e b 2 1 3 4 5 6 Figure 3.6: Two distinct tours in the Sarvanov and Doroshko neighborhood Thus, all nodes in even position may appear in a different, but still even position in the new Hamilton tour. So, for example, if we consider the following tour on 6 nodes S1 = (a,b, c,d, e, f) (in bold nodes in even position), then its neighbor- hood will be N(S) = {S1, S2, S3, S4, S5, S6}, where S2 = (a,b, c, f , e,d), S3 = (a,d, c,b, e, f), S4 = (a,d, c, f , e,b), S5 = (a, f , c,b, e,d), , S6 = (a, f , c,d, e,b). Observe that if in this example node b occupies position 6 in the final order, then arc (e, b) and arc (b, a) will appear in the corresponding Hamilton tour, and the contribution to the overall cost of such two arcs will be w(e, b)+w(b, a). In other words, the cost of assigning each even node to an even position can be easily com- puted in advance by looking at the odd nodes adjacent to such position, which stay unchanged. So, assume for the sake of simplicity n to be even, and let S = (v1, v2, . . . , vn) be a given order. The cost of assigning node v2i, for i = 1, . . . , n/2 to position v2k, for k = 1, . . . , n/2 will be cik = w(v2k−1, v2i)+w(v2i, v2k+1) (where n+1 = 1). We have n/2 even nodes and n/2 even positions and each tour in the neighborhood of S corresponds to a perfect matching between the set of even nodes and the set of even positions. Hence, an optimal solution in the neighborhood of S is simply 35 x ∈ T} be a relaxation of (Q) (i.e. S ⊆ T ), with the property that (R) can be solved efficiently (to optimality) by some exact method. Since we are dealing with combinatorial optimization problems, we may assume that both S and T are bounded. Also, recall that, by definition, we have v(R) ≥ v(Q). Finally, let LB be a lower bound on the optimal value v(Q) to (Q). For example, if we have at hand a feasible solution x̄ ∈ S then we can set LB = w(x̄), or we may let LB = −∞ if no such solution is available. Now, since we know how to do it efficiently, we solve (R) instead of (Q). Let xR be an optimal solution to (R). Several cases can occur and in some of these cases the solution to (R) also provides us with a solution to (Q). In particular, this happens when (i) (Infeasibility): (R) admits no solution, which implies that T is empty, which in turn implies that S is empty and (Q) has no solution. (ii) (Optimality): xR ∈ S. Then xR is an optimal solution to (Q). In fact, xR ∈ S implies w(xR) ≤ v(Q). Since xR is optimum to (R), then w(xR) = v(R) ≥ v(Q). (iii) (Value dominance) v(R) = w(xR) ≤ LB. Since w(xR) ≥ w(x) for all x ∈ T , and S ⊆ T , this implies that the solution at hand x̄ ∈ S is already optimal or, in any case, no solution in S can improve over the lower bound LB. We can then state the following Proposition 4.1. Let (Q) max{w(x) : x ∈ S} be a combinatorial optimization problem, let LB be a lower bound on v(Q) and let (R) be a relaxation of (Q). Suppose (R) is solved to optimality. Then the solution of (R) also provides a solution of (Q) whenever: (i) (Infeasibility): (R) has no solution. (ii) (Optimality): (R) has an optimal solution xR and xR ∈ S (iii) (Value dominance): (R) has an optimal solution and v(R) ≤ LB. Even if we have at hand a ”good” relaxation (R) of (Q), it is often the case that the solution of (R) does not fall into one of the conditions of Lemma 4.1. How can we find an optimal solution to (Q) (or prove that none exists) in this case? In order to solve difficult (combinatorial) optimization problems, relaxations are often combined with enumerative techniques, based on the so called divide and conquer approach.divide and conquer Let S(u), u ∈ V be a partition of S, so these sets are pairwise disjoint and their union equals S. Then we have that v(Q) = max v(Q(u)) where (Q(u)) is the restricted problem max {w(x) : x ∈ S(u)}. By solving the restricted problems and comparing we can find an optimal solution to (Q). In 38 fact, an optimal solution to (Q) belongs to exactly one S(u), for u ∈ V . In any case, the optimal solution x∗(u) ∈ S(u) to (Q(u)) is also feasible to (Q), since S(u) ⊆ S. and it provides us with a lower bound w(x∗(u)) on v(Q). Since S ⊆ {0, 1}n, a natural way of partitioning S is to fix components to either 0 or 1. For instance, we can let S0 (S1) be those vectors in S with x1 = 0 (x1 = 1). Furthermore, one can establish partitions recursively by fixing new variables based on the previous partition. For instance, let S0,0 (S0,1) be those x ∈ S with x1 = 0 and x2 = 0 (x1 = 0 and x2 = 1). For simplicity in the presentation we shall assume that there is a predetermined ordering of the variables which is followed when variables are fixed. enumeration tree The recursive partitioning may be organized in an enumeration tree (or branching tree) with nodes corresponding to subsets of S and edges indicating a fixing. The nodes are partitioned into n layers; in the k’th layer the first k variables are fixed. Thus there are at most 2k nodes in the k’th layer correspond to all possible ways of fixing the k first variables. For each node the subset S(v) of S consists of those vectors in S that satisfy the variable fixing specified in that node. The restricted optimization problem max {w(x) : x ∈ S(u)} associated with node u will be denoted by (Q(u)). The single node vr in layer 0 is called the root node and each of the nodes in the n’th layer are called bottom nodes. When u is a bottom node the problem (Q(u)) is trivial as all variables are fixed. The edges go between two consecutive layers. More precisely, a “mother node” on layer k has two adjacent nodes (“children”) in layer k + 1 and they are obtained from the variable fixing in the mother node by fixing xk+1 to either 0 or 1. The bottom nodes represent a complete enumeration of all the feasible solutions in the problem (Q). Even for moderate values of n the enumeration tree is too large for practical purposes. The key idea is that we may only need to consider a small part of this tree in order to solve the problem. For a nonroot node u the tree contains a unique path p(u) between u and the root node. For each nonroot node w ∈ p(u) we say that u is below w or that w is above u. Under certain conditions a node u may be pruned which means that we need not solve any of the problems (Q(u′)) for pruning nodes u′ that are below u. This happens when we know that these problems can not improve on the current best solution. The following pruning criteria should be clear. Proposition 4.2. A node u in the enumeration tree may be pruned if one of the following conditions holds. (i) (Infeasibility.) S(u) is empty. (ii) (Optimality.) An optimal solution of (Q(u)) is known. (iii) (Value dominance.) v(Q(u)) ≤ v(Q). 39 x1 = 0 x2 = 0 x3 = 0 x2 = 1 x2 = 1 x1 = 1 S1 S3 S4 x2 = 0 S5x3 = 1 S2 Figure 4.1: A (partial) enumeration tree Enumeration trees with pruning may be represented by partial binary trees, like the one of Figure 4.1. Note that the leaves of a partial binary tree corre- spond to the classes of a partition of S, each class being identified by the vari- able fixing associated to the unique path from the corresponding leave to the root. The tree in Figure 4.1 represents the partition {S1., . . . , S5}, where S1 = {(0, 0, 0)}, S2 = {(0, 0, 1)}, S3 = {(0, 1, 0), (0, 1, 1)}, S4 = {(1, 0, 0), (1, 0, 1)}, S5 = {(1, 1, 0), (1, 1, 1)}. Solving the original problem (Q) is equivalent to solving problems (Q1), . . . , (Q5), associated to S1, . . . , S5. Finally observe that we may let different variables be fixed in a same layer of a partial enumeration tree. An example is given in the Figure 4.2. These more general trees are called branching trees. x1 = 0 x2 = 0 x3 = 0 x2 = 1 x3 = 1 x1 = 1 x3= 0 x3 = 1 Figure 4.2: A branching tree 40 max 4x1 + 5x2 3x1 + 4x2 ≤ 6 5x1 + 2x2 ≤ 6 (R(u0)) x2 = 0 0 ≤ x ≤ 1 The optimal solution to (R(u0)) is x1(u0) = 1, x2(u0) = 0, and its value is z(u0) = 4. Since x(u0) ∈ S(u0) (all components are 0 or 1) and z(u0) > zL we update the best solution x∗ = x(u0), we set zL = 4 and we prune u0. Now Vn = {u1}: we extract u1 and solve its relaxation max 4x1 + 5x2 3x1 + 4x2 ≤ 6 5x1 + 2x2 ≤ 6 (R(u1)) x2 = 1 0 ≤ x ≤ 1 The optimal solution to (R(u1)) is x1(u1) = 0.667, x2(u0) = 1, and its value is z(u1) = 7.667. Since x(u1) /∈ S(u1) and z(u1) > zL we resort to branching. The only fractional variable is x1, so we branch on x1 and generate two new nodes u10 and u11, corresponding to fixing x1 to 0 and 1, respectively. Now Vn = {u10, u11} and we extract u10 and solve its relaxation max 4x1 + 5x2 3x1 + 4x2 ≤ 6 5x1 + 2x2 ≤ 6 (R(u10)) x2 = 1 x1 = 0 0 ≤ x ≤ 1 The optimal solution to (R(u10)) is x1(u10) = 0, x2(u10) = 1, and its value is z(u10) = 5. Since x(u10) ∈ S(u10) and z(u10) = 5 > 4 = zL we let the best solution to be x∗ = x(u10), zL = 5 and we prune u10. Now we extract u11 from Vn, (R(u11)) is infeasible, and u11 is pruned. Finally Vn is empty and the algorithm 43 x2 = 0 x1 = 1 x2 = 1 u0 x1 = 0 u10 u11 u Figure 4.3: The (partial) branch-and-bound tree of Example 4.1 terminates. The partial branching tree associated to the above application of branch-and-bound is shown in Figure 4.3. Two main issues in the development of branch-and-bound algorithms are node selection and variable selection. Whenever we have solved the relaxation R(u), and we do not terminate, we have to select the next node to be processed (also called the next active node). In this node selection problem several strategies exist, some are based on a priori rules and others are adaptive (depending on the calculations). For instance, a common strategy is breadth-first-search plus backtracking where one always chooses the next node as a child node and backtracks if the node is pruned. In the variable selection problem one decides how to make the partitioning that determines the (two or more) children problem. Empirically one knows that this choice affects the over-all speed of the algorithm, but still it is hard to find good rules for se- lecting “critical variables”. A useful strategy is to predetermine some ordering of the variables based on the coupling in the constraints. For instance, the vari- ables may fall into two classes such that fixing all the variables in the first class makes the remaining ones integral. In such a case it makes sense to branch on fractional variables in the first class whenever possible. Other possible techniques are discussed in [15]. One crucial point is the strength of the LP relaxations, that is the quality of the formulations and of the corresponding bounds. If the LP (optimal value) bounds are too far away from the optimal value one cannot prune the enumeration tree and the algorithm becomes slow, often too slow for all practical purposes, see 44 Exercise 4.2. So, we are interested in finding good formulations for the set S of 0- 1 solutions of our combinatorial optimization problem. Following the discussion of Chapter 1, we would like to optimize over conv(S), but typically we need to content ourselves with much weaker formulations, such as the natural one. However, the initial formulation can be significantly strengthen in each node of the branching tree. cutting planes Suppose so we have solved the linear relaxation (R(u)) associated to the current formulation P (u), but node u cannot be fathomed. Now, instead of proceeding with branching we may try to strengthen the current formulation. This is done by invoking a separation oracle which tries to find an inequality belonging to a stronger formulation which is violated by the current solution x(u). This in- equality is called a cutting plane, since it can be seen as a hyperplane (or cut) separating x(u) from conv(S(u)). Typically, such a cut belongs to some prede- fined class of inequalities and in this case, we are within the so called Template Paradigm. But we may also look for general inequalities and we discuss an ex- ample later in this chapter. The method which combines branch-and-bound and cutting planes is called branch-and-cut. 4.2 Finding additional valid inequalities We discuss some methods for finding classes of valid inequalities for a set of integral points or simply 0-1 points. Ideally we would like to have methods for going from a polyhedron P (the initial formulation for a set of solution S = P ∩ {0, 1}n) to its integer hull PI (or conv(S)). We shall mention a theoretical result which says that this is indeed possible although the construction is far from practical. Thus we shall also discuss other general techniques that are applicable to any integer linear programming problem. valid inequal- ity First we fix some terminology. Let S be some subset of IRn and let aT x ≤ α be an inequality with a 6= 0. We say that aT x ≤ α is valid for S if S ⊆ H≤(a, α) = {x ∈ IRn : aT x ≤ α}, i.e., each point of S satisfies the inequality in question. We use the notation ai,. to denote the i’th row of A ∈ IRm,n viewed as a column vector. Similarly, a.,j denoted the j’th column of A. The Chvátal-Gomory procedure. Let P = {x ∈ IRn : Ax ≤ b} be a given polyhedron in IRn with A, b rational. We are interested in the problem of finding valid inequalities for the integer hull PI . Clearly, each inequality in Ax ≤ b is valid for PI , and the purpose is to provide 45 that the set of 0-1 vectors satisfying 0 ≤ x ≤ 1 and x(δ(v)) ≤ 1 for each v ∈ V coincides with the set of incidence vectors to matchings in G. Let Ax ≤ b denote this linear system. The polyhedron P = {x ∈ RV : Ax ≤ b is a formulation for the set of incidence vectors of the matchings in G, and is called the fractional matching polytope. The matching polytope is defined as the convex hull of the incidence vectors of matchings. Let S ⊂ V consist of an odd number of nodes, say |S| = 2k + 1. Then the inequality x(E[S]) ≤ k is clearly valid for PI as each matching contains no more than k pairs of nodes in S. Thus, the validity is due to a simple combinatorial argument. However, this inequality may also be obtained using the Chvátal-Gomory procedure on the original system Ax ≤ b. Consider the inequalities x(δ(v)) ≤ 1 for v ∈ S and the inequalities −xe ≤ 0 for e ∈ δ(S). If we multiply each of these inequalities by 1/2 and add them together, we get the (valid) inequality x(E[S]) ≤ k+1/2. By integer rounding we get the desired inequality x(E[S]) ≤ k which proves that this inequality may be obtained by the Chvátal-Gomory procedure applied to the system consisting of the degree inequalities and simple bounds. The matching polytope is completely described by the degree inequalities, simple bounds and the odd set inequalities x(E[S]) ≤ ⌊|S|/2⌋ for S ⊆ V and S odd. Thus, all the facet defining inequalities for the matching polytope are obtained by adding “one round of cutting planes” and we say that the original polyhedron P has Chvátal-Gomory-rank 1. A feature of the Chvátal-Gomory procedure is that it may be affected by scaling of the inequalities. For instance, if P = {x ∈ IR : x ≤ 3/2} then PI = {x ∈ IR : x ≤ 1} and the inequality x ≤ 1 is obtained by integer rounding from x ≤ 3/2. However, P is also the solution set of the (scaled) inequality 2x ≤ 3, and rounding directly on this inequality does not change the inequality. From this we realize that integer rounding should be preceded by a proper scaling of the inequality, i.e., dividing by the greatest common divisor of all the numbers involved. For a single inequality this produces the integer hull as the next result says. Proposition 4.4. Let P = {x ∈ IRn : ∑n j=1 ajxj ≤ α} where all the aj’s are integers. Let d be the greatest common divisor of a1, . . . , an. Then PI = {x ∈ IRn : ∑n j=1(aj/d)xj ≤ ⌊α/d⌋}. Boolean implications. Sometimes one can find new inequalities by detecting logical (boolean) implica- tions of one or more constraints of the original system.knapsack problem The problem max { ∑n j=1 cjxj : ∑n j=1 ajxj ≤ b, 0 ≤ x ≤ 1} where all the data are positive integers, is called the knapsack problem. Let P = {x ∈ IRn : ∑n j=1 ajxj ≤ b, 0 ≤ x ≤ 1}, S = P ∩ {0, 1}n and conv(S) is the knapsack polytope. As a 48 specific example let n = 3, a1 = 3, a2 = 3, a3 = 2 and b = 7. Let C = {1, 2, 3} and note that a(C) = ∑ j∈C aj = 8 > b. We call C a dependent set or cover. Since a(C) > b, no feasible integral solution in P can have all variables in C equal to 1. Thus x(C) ≤ |C| − 1 is a valid inequality for PI which is often called a cover inequality. These inequalities have proved to be very useful for solving several cover inequal- itydifferent combinatorial optimization problems since cover inequalities may be derived for individual constraints in the integer linear programming formulation. We also remark that for the polyhedron P above (the linear relaxation of the knapsack problem) all the vertices may be described in a simple way, see Problem 4.1. Note that any linear inequality in 0-1 variables may be transformed to the situ- ation just treated. A general linear inequality with rational data may be written (after suitable scaling) ∑ j∈J1 ajxj + ∑ j∈J2 ajxj ≤ b with aj, j ∈ J1 positive in- tegers and aj , j ∈ J2 negative integers (j’s with aj = 0 are not of interest for the analysis). We make the affine transformation zj = 1 − xj for j ∈ J2, and the transformed inequality is ∑ j∈J1 ajxj + ∑ j∈J2 (−aj)zj ≤ b− ∑ j∈J2 aj . Here all coefficients are positive, so we may establish (e.g.) cover inequalities as above and finally transform these back into new valid inequalities for the original problem. variable upper bound Another type of constraint that is often met in applications is (i) ∑ j∈N yj ≤ nx; (ii) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1; (iii) x ∈ {0, 1}. (4.2) The logical contents of the constraints is that x = 1 whenever the sum of the continuous variables yj, j ∈ N is positive. Alternatively, all yj, j ∈ N must be 0 whenever x = 0. This type of constraints are called variable upper bounds. variable upper boundHowever, this means that all the inequalities yj ≤ x for j ∈ N are also valid for the solution set of (4.2). By adding these new constraints we cut off fractional solutions (from the continuous relaxation of (4.2)) like e.g., y1 = 1, yj = 0 for j ∈ N \ {1}, x = 1/n since the inequality y1 ≤ x is violated. For more about related mixed integer sets like the variable upper-bound flow models, confer [15]. Combinatorial implications. For combinatorial polyhedra, i.e., polyhedra with vertices corresponding to some class of combinatorial objects, one may find valid inequalities by exploiting these combinatorial properties. The example with the matching polytope given in the previous paragraph fits into this framework. Here we give some other examples. 49 First, we consider the set covering problem (see e.g., [4]). This problem is of relevance to many applications. For instance, in airline crew scheduling each flightset covering problem must be covered; in allocating student classes to rooms, each class must get a room, or in network design a number of capacity “bottlenecks” (like cuts) must be covered. Let I and J be the color classes of a bipartite graph G, so each edge in the edge set E joins some node in I to some node in J . Let cj for j ∈ J be given nonnegative weights. By a cover we understand a subset S of J such that each node in I is adjacent to at least one node in S. (Of course, we assume that G allows this to happen, i.e., each node in I is adjacent to some node in J). The set covering problem is to find a cover of minimum weight, where the weight w(S) of a cover S is defined as w(S) = ∑ j∈S wj . This problem is NP-hard. The polyhedral approach to this problem may start by introducing the set covering polytope PSC as the convex hull of vectors χS where S is a cover. An integer linear programming formulation of the set covering problem is obtained by letting xj indicate whether node j is in the cover to be determined. For i ∈ I we let Γ(i) denote the set of nodes in J that are adjacent to i. minimize ∑ j∈J cjxj subject to (i) ∑ j∈Γ(i) xj ≥ 1 for all i ∈ I; (ii) 0 ≤ xj ≤ 1 for all j ∈ J ; (iii) x is binary. (4.3) Thus the constraint (4.3)(i) assures that each node in I is covered. Let P be the so- lution set of (4.3)(i)–(ii), so the integer hull PI of this polytope is precisely the set covering polytope PSC. Due to the hardness of the set covering problem, it is too ambitious to find a complete linear description of PI . However, in order to numer- ically solve practical set covering problems one may need to find some of these in- equalities and add them to the description of P . In other words, the LP relaxation using P may give poor lower bounds on the true optimal value of the set covering instance of interest. For example, consider a graph with nodes I = {i1, i2, i3} and J = {j1, j2, j3}, and with the six edges [i1, j1], [i1, j2], [i2, j2], [i2, j3], [i3, j3], [i3, j1]. Note that each node in J covers two consecutive nodes in I. Assume that the objective function is c = (1, 1, 1). Then an (in fact, the) optimal solution of the LP relaxation is x̄ = (1/2, 1/2, 1/2) with objective value 3/2. The optimal value of the integer program, and therefore the set covering problem, is 2. Thus, some valid inequality for PSC is required to cut off this fractional vertex of P . Such an inequality may be deduced from a simple combinatorial argument: no single node in J can cover all the nodes in I, therefore at least two nodes in J must be chosen. Thus the inequality x1 + x2 + x3 ≥ 2 is valid for PSC (as it holds for all vertices and then, by convexity, for all points of PSC). Also, it is violated by x̄. If we add 50 Since this holds for all λ ∈ IRn +, the best upper bound obtained in this way is given by solving the so-called Lagrangian dual problem (LD) (w.r.t. A2x ≤ b2) min{v(LR(λ)) : λ ≥ 0} (4.6) and we get the important inequalities v(Q) ≤ v(LD) ≤ v(LR(λ)) for all λ ∈ IRn +. (4.7) The Lagrangian dual problem may be viewed as a nondifferentiable convex min- imization problem as v(LR(λ)) is a piecewise linear and convex function (it is the pointwise minimum of a finite number of affine functions). Algoritmically one tries to solve the Lagrangian dual problem by some kind of multiplier adjustment technique. The basic principle is to adjust the multiplier according to the current optimal solution x. If x violates the constraint, the penalty (multiplier) is in- creased, but if x satisfies the constraint, the penalty is decreased. Different ideas are used for deciding how much these adjustments should be, and for this good stategies are problem dependent. For a discussion of one such general technique, called the subgradient method, see [15]. We consider an application which illustrates the idea of Lagrangian relaxation. The degree-constrained spanning tree problem (DCST) is to find a minimum weight spanning tree satisfying given degree constraints. More specifically, let w be a nonnegative weight function defined on the edges of a graph G = (V, E) and let bv for v ∈ V be given positive integers. We want to find a spanning tree T satisfying the degree constraints dT (v) ≤ bv for each v ∈ V and with minimum total weight w(T ) = ∑ e∈T we. (Of course, this problem is infeasible if the bv’s are “too small”). This problem is known to be NP-hard, see [8]. But we know that the spanning tree problem is tractable, i.e., polynomial, and this can be exploited as follows. It is possible to write down a linear system A1x ≤ b1 with 0-1 solutions that correspond to the incidence vectors of sets F ⊆ E such that (V, F ) contains a spanning tree. Finding such a system is left as an exercise, but here we only need to know that the system exists. Then our problem (DCST) may be written as (4.4) with c = −w and the system A2x ≤ b2 being x(δ(v)) ≤ bv for all v ∈ V . (4.8) The Lagrangian relaxation w.r.t. the degree constraints (4.8) is essentially a span- ning tree problem. The objective function to be minimized in this problem is wTx + ∑ v∈V λv(x(δ(v)) − bv). 53 This means that the weight of a spanning tree T becomes (use x = χT ) − ∑ v∈V λvbv + ∑ [u,v]∈T (wuv + λu + λv). This objective will therefore tend to give spanning trees having low degrees. The Lagrangian relaxation can for each λ be solved by, e.g., Kruskal’s algorithm. Thus, combined with a suitable multiplier technique we can solve the Lagrangian dual and obtain a lower bound on the optimal value of the DCST problem. Also, if we are lucky and find an optimal spanning tree in the final Lagrangian relaxation which satisfies all the degree constraints (4.8), then this solution is also an optimal solution of (4.8). Otherwise, one usually constructs a feasible spanning tree solution by some kind of heuristic method based on the last subproblem. This also produces a bound on the optimality gap. We return to the general theory of Lagrangian relaxation. Consider again the Lagrangian relaxation w.r.t. A2x ≤ b2 given in (4.5). The objective function cT x +λT (b2 −A2x) = (cT −λT A2)x +λT b2 is an affine function of x, i.e., a linear function c(λ)x (with c(λ) := cT − λT A2) plus some constant. Since the constant may be removed from the optimization, the problem (4.5) consists in maximizing a linear function over the 0-1 vectors in the polyhedron defined by A1x ≤ b1. As discussed in the introduction to this chapter we may convexify such a problem and obtain an equivalent LP problem max{c(λ)Tx : x ∈ P 1 I } where P 1 I is the integer hull of the polyhedron P 1 = {x ∈ IRn : A1x ≤ b1, 0 ≤ x ≤ 1}. Thus the Lagrangian relaxation corresponds to “integralization” with respect to the system A1x ≤ b1, 0 ≤ x ≤ 1 and translating the objective function with some linear combination of the row vectors in A2. To proceed, it follows from Motzkin’s theorem that P 1 I = conv({xk : k ∈ K}) where xk, k ∈ K is a finite set of vectors in IRn; these are the vertices of P 1 I . Therefore we obtain v(LD) = min{v(LR(λ)) : λ ≥ 0} = minλ≥0[maxx∈P 1 I (cT − λT A2)x + λT b2] = minλ≥0[maxk∈K(cT − λT A2)xk + λT b2] = minλ≥0[min{η : η ≥ (cT − λT A2)xk + λT b2, for all k ∈ K}] = min{η : λ ≥ 0, η + λT (A2xk − b2) ≥ cT xk, for all k ∈ K} = max{cT ∑ k∈K µkxk : A2 ∑ k∈K µkxk ≤ b2 ∑ k∈K µkxk; ∑ k∈K µk = 1; µ ≥ 0} = max{cT x : A2x ≤ b2, x ∈ P 1 I }. 54 The second last equality was obtained using linear programming duality. Note also the transformation used for converting the inner maximization problem into an LP problem of minimizing an upper bound. We have therefore shown the following result. Theorem 4.5. The Lagrangian dual problem ( 4.6) may be viewed as the dual of the LP problem max{cT x : A2x ≤ b2, x ∈ P 1 I }. In particular, the optimal values of these problems coincide. This result is the main result on Lagrangian relaxation. It says that the bound obtained from solving the Lagrangian dual equals the one obtained by the LP problem with feasible set based on integralization only w.r.t. the constraints that are not relaxed. Define the three polytopes P 1,2 I := {x ∈ IRn : 0 ≤ x ≤ 1, A1x ≤ b1, A2x ≤ b2}I ; (P 1 I )2 := {x ∈ IRn : 0 ≤ x ≤ 1, A1x ≤ b1}I ∩ {x ∈ IRn : A2x ≤ b2}; P 1,2 := {x ∈ IRn : 0 ≤ x ≤ 1, A1x ≤ b1, A2x ≤ b2}. (4.9) Maximizing cT x over P 1,2 I correponds to the original integer program (more pre- cisely, transformed into an LP); maximizing over (P 1 I )2 corresponds to solving the Lagrangian dual and, finally, maximizing over P 1,2 is simply the LP relaxation of the integer program (4.4). Let LP denote the last program. Since we have P 1,2 I ⊆ (P 1 I )2 ⊆ P 1,2 we get the following ordering of the optimal values in these optimization problems v(Q) = max{cT x : P 1,2 I } ≤ max{cT x : (P 1 I )2} = v(LD) ≤ max{cT x : P 1,2} = v(LP ). This means that the Lagrangian bound may improve on the bound coming from the LP relaxation. Note, however, an important consequence of Theorem 4.5 concerning the bounds. Corollary 4.6. If the polyhedron P 1 is integral, i.e., has integral vertices, then v(LD) = v(LP ). Thus, if integrality may be dropped in the Lagrangian subproblems (i.e., {x ∈ IRn : A1x ≤ b1} is integral), then we will not improve compared to the bound obtained by solving the original LP relaxation. Usually, in such cases, Lagrangian relaxation is not used unless it is viewed as more practical than solving the LP. 55 4 23 1 5 1 2 4 2 2 4 23 1 5 1 2 4 2 2 4 23 1 5 1 2 4 2 2 s Figure 4.5: From sequencing to TSP machines. The target here is to minimize the distance run by the drilling machine. This problem is reduced to a TSP problem by associating a node with every hole and by associating with every edge (u, v) a cost equal to the distance between hole u and hole v on the chip. A second important application is in cutting connections between logic gates on (customized) chips. This is done by a laser, which draws a Hamilton path through the gates. Minimizing the overall (Manhattan) distance run by the laser is a crucial issue for reducing production costs and times. Another important issue in chip manufacturing is testing. To do this, a number of scan points are established on the chip. Scan points must then be connected by a scan chain to allow test data to be loaded into the scan points through some input end. Clearly, the problem of finding the correct sequencing of scan points is again a TSP problem. • Various applications. Several other applications concern aiming telescopes and x-rays, data mining, machine scheduling, picking items in warehouses, cutting problems in glass industry, printing circuit boards, etc. A formulation for the TSP. Let G(V, E) be a graph, with a cost ce ∈ IR+ associated to every e ∈ E. In Chapter 1 we have shown that x ∈ {0, 1}E is the incidence vector of a Hamilton cycle in G if and only if it satisfies (i) x(δ(v)) = 2 for all v ∈ V ; (ii) x(δ(W )) ≥ 2; for all W ⊂ V , W 6= ∅, W 6= V ; (iii) 0 ≤ x ≤ 1; (4.10) 58 The constraints (i), and (iii) ensure that a solution x ∈ {0, 1}E is of the form x = χF where F ⊆ E and d(V,F )(v) = 2 for each v ∈ V ; such a set F is called a 2−matching (or 2-factor), since each node has degree 2 in the subgraph induced 2-matching in G by F . Clearly, every tour is a 2-matching, so all these inequalities are valid. However, in general a 2-matching is a union of disjoint cycles, so it may not be a tour. The 2-connectivity inequalities (ii) eliminate such a possibility, i.e., a 2- matching that satisfies (ii) is a tour (otherwise we could let W be the node set of one subtour, and we would get a violated inequality). From this it is not difficult to see that the feasible 0-1 solutions in (4.10) are precisely the incidence vectors of tours, see Problem 4.3. An equivalent set of constraints that can replace (4.10)(ii) is the set of subtour elimination constraints : subtour elimination constraint x(E[S]) ≤ |S| − 1 for all S ⊆ V , S 6= ∅, S 6= V . (4.11) These constraints were introduced in the 1954 paper by Dantzig, Fulkerson and Johnson [5]. Note that the number of 2-connectivity inequalities, or subtour elim- ination constraints, grows exponentially as a function of the number of nodes. From the above discussion, it follows that the following model is a valid 0-1 linear program of the TSP: min ∑ e∈E cexe subject to (i) x(δ(v)) = 2 for all v ∈ V ; (ii) x(δ(W )) ≥ 2; for all W ⊂ V , W 6= ∅, W 6= V ; (iii) 0 ≤ x ≤ 1; (iv) x is binary. (4.12) Relaxations. A standard relaxation of (4.12) is obtained by removing the binary stipulation (iv) on the x variables. This approach and the resulting branch-and-cut will be discussed more in detail in the following sections. Here we want to mention an alternative relaxation, which is obtained by removing the 2-connectivity con- straints (4.12)(ii) (but leaving the binary stipulation). This relaxation can be readily transformed into a matching problem in a bipartite graph (the assign- ment problem) and therefore solved efficiently. This relaxation may be combined with a suitable branch-and-bound scheme with partitioning that assures the 2- connectivity. This approach is called the assignment problem/branch-and-bound algorithm. 59 Consider the graph G and choose a node, say node 1. A 1-tree in G is a set1-tree F ∪ {e, f} where (V \ {1}, F ) is a spanning tree and e and f are two edges incident to node 1. 5 34 2 1 5 34 2 1 Figure 4.6: A 1-tree: the solid edges form a spanning tree F in G[V \ {1}] Observe that each Hamilton tour is a 1-tree, but the converse inclusion is false. The important property of 1-trees is that they are connected, and that the fol- lowing characterization of tours is valid: F is a tour if and only if it is both a 2-matching and a 1-tree. Now, the incidence vectors of 1-trees are the feasible solutions of the following set of constraints: (i) x(δ(v)) = 2 for v = 1; (ii) x(E[W ]) ≤ |W | − 1 for all W ⊂ V \ {1}, |W | ≥ 3; (iii) x(E) = |V |; (iv) 0 ≤ x ≤ 1; (v) x is integral. (4.13) If we here add the degree constraints x(δ(v)) = 2 for v ∈ V \ {1}, we get the (in- cidence vectors of) tours as solutions. Instead of doing this, we relax these degree constraints using Lagrangian relaxation. We introduce Lagrangian multipliers λv (that are unrestricted in sign, as we have equalities) where λ1 = 0 and get the following Lagrangian relaxation: 2 ∑ v∈V λv + min{ ∑ uv∈E (cuv − λu − λv)xuv : x is the incidence vector of a 1-tree}. Let v1T (λ) be the optimal value of this problem. The Lagrangian dual is then (LD) max {v1T (λ) : λ ∈ IRV , λ1 = 0} with optimal value v(LD). Based on Corollary 4.6 60 forming a sequence of standard shrinkings. Then the resulting graph, denoted by G|S, have a single vertex z instead of S and the new weights w|S are defined as the following mapping φ(w, S): (w|S)uv = wuv for all u, v ∈ V \ S (w|S)zr = w(S, {r}) for all r ∈ V \ S (4.14) where, for any w ∈ IRE , and sets S, T ⊂ V , S ∩ T = ∅, we let w(S, T ) = ∑ u∈S,v∈T wuv. Let’s go back to our original task of computing a global minimum weight cut S∗ in G. Consider any pair s, t ∈ V , with s 6= t. Then, either s ∈ S∗, t /∈ S∗, that is S∗ is an st-cut, or both s and t are in S∗ (the case with S∗ and V \ S∗ interchanged is equivalent and need not to be considered). Now, if s, t ∈ S∗, then it is easy to see that S ′ = S∗ \ {u, v} ∪ {z} is a global minimum weight cut in G|uv (where z is the node replacing u and v). This observation is the basis of the following algorithm to compute (the weight of) a global minimum weight cut. Global Minimum Weight Cut Algorithm Step 1. Let Ḡ(V̄ , Ē) = G(V, E), w̄ = w. Let UB = +∞. Step 2. If |V̄ | ≤ 1, STOP. Step 3. Select a pair s, t ∈ V̄ . Find a minimum st-cut S̄ in Ḡ, with weights w̄. If w̄(S̄) < UB, set UB = w̄(S̄). Step 4. Set Ḡ = Ḡ|st, w̄ = w|st. Go to Step 2. At termination, UB is the weight of a global minimum weight cut. Since at each iteration the number of nodes of the current graph decreases by one unit, the number of calls to a min st-cut solution algorithm at Step 3 is equal to |V | − 1. Actually the above algorithm only finds the weight of a minimum cut, but it is trivial to modify it in order to store the nodes of a minimum cut as well (by suitably handling the new nodes appearing in the shrinkings). Safe shrinking. Shrinking is a powerful tool to effectively separating violated safe shrinking 2-connectivity constraints. To clarify this concept, let G′(V ′, E ′) = G|uv and let x̄ ∈ IRE . Define x̄′ = x̄|uv. Suppose now that x̄ violates a subtour elimination constraint associated to graph G. Then we may wonder if x̄′ also violates a subtour elimination constraint associated to graph G′ (in this case the shrinking is said to be safe). Unfortunately, this is not always the case. However, we can give sufficient 63 conditions for this to happen, like the simple following one due to Padberg and Rinaldi (see, e.g. [1]). Proposition 4.7. If x̄uv = 1 and there exists a vertex t such that x̄({t}, {u, v}) = 1, then it is safe to shrink {u, v}. The above rule and many others can be applied to search for safe shrinkings in G, and shrink the corresponding edges. Also, this technique can be applied recursively to the new graphs, until no safe shrinks can be done. Most often, the final graph is considerably smaller then the original one and the minimum weight cut computations can be performed more efficiently. Finally observe that at each iteration of the global minimum weight cut algorithm we deal with a new graph (a shrinking), and safe shrinks may re-appear. Template valid inequalities: combs. If n is either 4 or 5, the Traveling Salesman Polytope is completely described by the trivial bounds, degree constraints and subtour elimination constraints. For larger number of nodes, other facets come into play. One such large class of inequalities is described in what follows. H T1 T2 T3 Figure 4.8: A comb A comb in G is a class of sets H , Ti for i ≤ k all being subsets of the node set Vcomb inequal- ity and satisfying • H ∩ Ti is nonempty for i = 1, . . . , k; • Ti \ H is nonempty for i = 1, . . . , k; • the sets Ti for i ≤ k are pairwise disjoint; • k ≥ 3 is an odd number. 64 The set H is called the handle of the comb, and the Ti’s are the teeth. Associated with each comb is a valid inequality for PTSP which may be derived using the Chvátal-Gomory procedure as follows. Consider the valid inequalities (i) x(δ(v)) = 2 for all v ∈ H; (ii) x(E[Ti]) ≤ |Ti| − 1 for i = 1, . . . , k; (iii) x(E[Ti \ H ]) ≤ |Ti \ H| − 1 for i = 1, . . . , k; (iv) x(E[Ti ∩ H ]) ≤ |Ti ∩ H| − 1 for i = 1, . . . , k; (iv) −xe ≤ 0 for e ∈ δ(H) \ ∪iE[Ti]. (4.15) If we add all these inequalities and divide the result by 2, we get the inequality x(E[H ]) + k ∑ i=1 x(E[Ti]) ≤ |H| + k ∑ i=1 (|Ti| − 1) − k/2. Now, due to integrality since k is odd, we may round the right-hand-side down and still have a valid inequality for PTSP . This gives the comb inequality comb inequal- ity x(E[H ]) + k ∑ i=1 x(E[Ti]) ≤ |H| + k ∑ i=1 (|Ti| − 1) − (k + 1)/2. (4.16) These inequalities were introduced by Grötschel and Padberg (1979) as a gen- eralization of the simple comb inequalities found by Chvátal; simple combs are combs where each H ∩ Ti consists of one node. It was shown by Edmonds and Johnson that the solution set of (i) the simple bound inequalities (0 ≤ x ≤ 1), (ii) the degree constraints and (iii) the comb inequalities for which each tooth has cardinality two is precisely the convex hull of the incidence vectors of 2-matchings in G. This 2-matching polytope is actually an interesting relaxation of PTSP . The comb inequalities may be generalized into the so-called clique-tree inequali- ties where more handles are allowed and these are organized in a tree-like fashion, see [11]. Furthermore, the clique inequalities is a subset of the bipartition inequal- ities; other inequalities are called star inequalities, binested inequalities and so on. In fact, a lot of work has been done on understanding the facial structure of Traveling Salesman Polytopes and a main goal is to get a unifying understanding of the structure, not just many strange classes of inequalities. At present no polynomial separation algorithm is known for the comb or clique inequalities. However, for a fixed number of teeth, a polynomial algorithm was found recently. There is also a polynomial algorithm for a special subclass of the comb inequalities where |H ∩ Ti| = 1 and |Ti \ H| = 1 for each i (assuming that the point to be separated satisfies 0 ≤ x ≤ 1 and all the degree constraints). This 65 it directly to G. Indeed, Problem (4.19) is already as difficult as the original TSP! However, this methodology is widely used (also in different contexts) for generating local cuts. Let us briefly discuss how it is applied to the solution of the TSP, in particular in [1]. The idea is to map the original set V into some smaller set V̄ , for example by shrinking. This mapping induces a linear mapping φ(x) : E(V ) → E(V̄ ), namely (4.14). One can show that, if x is the incidence vector of a Hamilton tour in G, then x̄ = φ(x) is an integer vector representing a graphical Hamilton tour, thatgraphical Hamilton tour is a tour which passes through every node a strictly positive and even number of times (the same edge can appear several times). z u v s t vu s t G G|uv G|stz Figure 4.9: Shrinking edge uv (st) maps an Hamilton tour of G into a graphical tour of G|uv (G|st), In Figure 4.9, shrinking edge uv maps the original Hamilton tour into the graphi- cal tour of G|uv, while shrinking edge st produces the graphical tour of G|st. Let us denote by T̄ the set of integer vectors representing graphical Hamilton tours of the complete graph Ḡ(V̄ , Ē) on V̄ . Consider now a point x∗ ∈ IRE and its map- ping φ(x∗) ∈ RĒ. If V̄ is reasonably small, we can generate a general cut which is valid for conv(T̄ ) and violated by φ(x∗). Then this cut can be ”lifted” into the original space to produce a valid inequality violated by x∗. There are several, complicated issues which have to be addressed in this process. First, we need to ensure that, if x∗ /∈ conv(T ), then φ(x∗) /∈ conv(T̄ ) (that is, the shrinking is safe for the general cut). Second, we need a way to lift up the violated inequality into the original space. Third, we would like such inequalities to be facet defining for conv(T ). For a detailed discussion of all these issues, see [1]. 68 4.5 Exercises Exercise 4.1. Consider the knapsack problem max { ∑n j=1 cjxj : ∑n j=1 ajxj ≤ b, 0 ≤ x ≤ 1} where all the data are positive integers. Define the knapsack relaxation polyope by P = {x ∈ IRn : ∑n j=1 ajxj ≤ b, 0 ≤ x ≤ 1}. Assume that the variables have been ordered such that c1/a1 ≥ c2/a2 ≥ . . . ≥ cn/an. Try to guess an optimal solution and prove the optimality by considering the dual problem. Use this result to characterize all the vertices of P . What about the cover inequalities in relation to these vertices? Exercise 4.2. Consider the branch-and-bound algorithm. Consider a node u in the enumeration tree with v(R(u)) > v(Q) where Q is the integer program and R(u) is the LP relaxation in node u. Can we prune node u? Exercise 4.3. Prove, in detail, that ( 4.12) is a valid integer linear programming formulation of the TSP problem. Then do the same for the model obtained by replacing the cut inequalities by the subtour inequalities. Exercise 4.4. Try to figure out what the odd cover inequalities might be based on the example given for the set covering problem. Exercise 4.5. Consider the degree-constrained spanning tree problem. Find a valid integer linear programming formulation of this problem. Exercise 4.6. Consider the following problem. We shall decide location of sevice centers among a finite set of possible locations I. There is given a (finite) set J of customers, and each shall be connected to exactly one service centre. The cost of building a service centre at location i ∈ I is ci and the cost of connecting customer j to centre location i is di,j. The simple plant location problem is to decide in which locations service centres should be built and the connection of customers to centres so as to minimiize the total cost (design + connection cost). Model this problem as an integer linear programming problem. Figure out some data for a small example and solve the LP relaxation as well as the ILP on a computer using an optimization package (e.g., CPLEX). Exercise 4.7. Consider again the simple plant location problem from the previous problem. Suggest a Lagrangian relaxation algorithm for this problem. Discuss its properties (e.g., integrality). Exercise 4.8. Develop some simple heuristics for the simple plant location prob- lem. 69 Bibliography [1] D. Applegate, R.E. Bixby, V. Chvátal and W. Cook. The Travelling Sales- man Problem. Pricenton University Press, 2006. [2] R. K. Ahuja, T. L. Magnanti and J. B. Orlin. Network flows: theory, algo- rithms, and applications. Prentice-Hall, New Jersey, 1993. [3] N. Christofides, Worst-case anlysis of a new heuristic for the travelling sales- man problem, Report 388, Graduate School of Industrial Administration, Carnagie Mellon University, Pittsburgh, 1976. [4] G. Cornuejols and A. Sassano. On the 0, 1 facets of the set covering polytope. Mathematical Programming, 43:45–55, 1989.b [5] G.B. Dantzig and D.R. Fulkerson and S.M. Johnson, Solution of a large-scale traveling-salesman problem, Operations Research, 7:58–66, 1954. [6] G. Dahl, An introduction to convexity, Lecture Notes, University of Oslo, 2009. [7] J. Edmonds, Matroids and the greedy algorithm, Mathematical Program- ming, 1:127-136, 1971 [8] M.R. Garey and D.S. Johnson. Computers and intractability. A guide to the theory of NP- completeness. W.H. Freeman and company, 1979. [9] M. Grötschel. On the symmetric traveling salesman problem: solution of a 120-city problem. Mathematical Programming Study, 21:61–77, 1980. [10] M. Grrötsché, L. Lovász and A. Schrijver. Geometric algorithms and com- binatorial optimization. Springer, 1988. [11] M. Grötschel and W.R. Pulleyblank. Clique tree inequalities and the sym- metric travelling salesman problem. Mathematics of Operations Research, 11:537–569, 1986. [12] S. Lin, B. Kernighan, An effective heuristic algorithm for the traveling sales- man problem, Operations Research 21:498–516, 1973 [13] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys, editors. The Traveling Salesman Problem. Wiley, 1985. [14] L. Lovász, M.D. Plummer. Matching Theory. North-Holland, 1986. [15] G.L. Nemhauser, L.A. Wolsey. Integer and Combinatorial Optimization. John Wiley, & Sons, 1988. 70

Documents

questions

Linear Programming: Separation Oracles and the Dynamic Simplex Method, Study notes of Linear Programming

Related documents

Partial preview of the text