Solution of the Airline ToD Problem using Severely Limited Subsequence

Solution of the Airline ToD Problem using Severely Limited Subsequence James Priestley Department of Engineering Science University of Auckland New Zealand j.priestley@aucland.ac.nz Abstract The minimum-cost Tour-of-Duty (ToD) Problem is extremely well known in the airline industry. ToDs are typically constructed by considering the complete set of subsequent flights (subsequences) that may be operated by a crewmember after they operate a given flight. The column dimension of the resulting model is generally extremely large, so column generation is used. Limiting the set of subsequences that may be considered is beneficial in terms of reducing the number of columns and improving the integer properties of the LP relaxation. The downside to this approach is that solution quality can be compromised if the limited subsequence set is not carefully selected. We investigate how to dynamically alter the limited subsequence set throughout the solution process so as not to compromise solution quality, while still maintaining the benefits of limiting subsequence. The use of dual stabilisation and constraint aggregation techniques in this context will be discussed. We present comparative results from real airline problems to demonstrate the efficacy of this technique. 1 Introduction Major airlines face multi-million dollar operating costs and one of the biggest components of these is the costs associated with aircrew. Crew costs can be controlled by carefully planning the assignment of crew to scheduled flights. This assignment is known as the Tour-of-Duty (ToD) problem, or alternatively as the Pairings problem, and has a long history in the airline industry. Arabeyre et al. (1969) give a comprehensive summary of the early work in airline crew scheduling. Goldie (1996) gives a more recent summary and outlines details of the software on which our research is based. This paper describes and demonstrates an improved method for solving the ToD problem. The mathematical models formulated to describe the ToD Problem are generally difficult to solve. There are two reasons for this difficulty. First, the formulations result in massive zero-one integer programmes, especially in the column dimension which can number in the billions. The large number of columns makes a priori variable generation

impractical so column generation is used. However, the resulting resource constrained shortest path sub-problems are time consuming to solve. Second, the integer properties of the resulting LP-Relaxation are generally poor resulting in very fractional solutions. Consequently, a large amount of Branch and Bound needs to be performed in order to obtain integer solutions. Rather than attempting to solve these massive zero-one integer programmes a number of simplified approaches have been proposed in order to obtain good quality solutions in a reasonable time. Some authors have concentrated on the development of heuristic methods, for example Baker and Fisher (1981), while others have attempted to use decomposition techniques to reduce the size of the problem, for example Wren, Smith, and Miller (1985). Another approach, proposed by Ryan and Falkner (1988), aims to reduce the number of ToDs and obtain less fractional LP solutions through a technique called Subsequence Limitation. This technique is further described by Shah (2000) in the context of domestic operations by Ansett Australia. While effective, the fundamental drawback to this approach is that solutions can be sub-optimal if the limitation is not performed carefully. This paper begins with an introduction to the ToD problem and to the concept of subsequence. We describe Subsequence Limitation and outline its benefits and its drawbacks. We show how constraint aggregation, Elhallaoui et al. (2004), can be used to improve the performance of this technique. We then describe a new technique for solving the ToD Problem called Subsequence Identification and outline its similarities and difference to Subsequence Limitation and column generation. We show how dual stabilisation, du Merle et al. (1999), can be used to improve the performance of this technique. Preliminary computational results for solutions of real world ToD Problems using Subsequence Identification are presented. This paper concludes with a summary and outlines the future directions of the research that we are conducting. 2 The ToD Problem 2.1 The Larger Problem The ToD Problem is part of a series of optimizations that airlines must solve. This process can be divided into three stages: schedules planning, crew scheduling, and day of operation. In the schedules planning stage the airline decides where it will be flying to, when these flights will occur, and which aircraft will be operating each flight. The crew scheduling stage is divided into two parts: ToD planning and rostering. ToD planning involves assigning a particular class of crew to every flight and is the focus of this paper. Rostering involves assigning the ToDs created during the ToD planning stage to individual crew members. Finally, the day of operation stage is where disruptions to the optimised plan get fixed. 2.2 The Composition of a ToD Conditions imposed by employment contracts and Civil Aviation Authority rules limit the maximum length of time that a crew can safely work for and the minimum amount of rest that they must have after work. Since crew cannot work as long as aircraft the assignment of crew to flights must be different from the assignment of aircraft to flights. A ToD consists of a series of duty periods, where the crew is working, separated by rest periods where they are not. A duty period consists of one or more flights where the crew is either operating (ie. carrying out the duties necessary for the operation of an aircraft) or

paxing (ie. travelling on an aircraft as passengers to move to another location). A ToD must start and end at the same crew base so that the crew can return home after work. A ToD is generic to a class of crew because it is not assigned to an individual until later in the rostering problem. 2.3 The ToD Model Given a schedule containing m flights we require that each flight be operated by exactly one crew. A crew schedule is a set of ToDs that combine to cover all the flights in the schedule. We wish to find the minimum cost crew schedule. The cost of a ToD is made up of several components. There is the cost of paying the crew for the hours that they work, including overtime if a ToD is especially long. Then there are additional expenses such as allowances for meals, accommodation costs, and paxing. Given a set of n possible ToDs the ToD Problem can be formulated as a set partitioning problem. Indices i = Flight: 1,,m j = ToD: 1,,n Parameters c = Cost of the j' th ToD j 1, a ij = { 0, otherwise Decision Variables 1, if 1) x j = { 0, otherwise Model ToD n 2) Minimise j = if the j' th ToD coversflight i the j' th ToD is includedin the crew schedule 1 c j x j n 3) a x, i 1,..., m j = ij j = 1 = 1 Explanation 1) Ensure that each ToD is either used once in the crew schedule or not at all 2) Objective is to minimise the total cost of the ToDs used in the crew schedule 3) Ensure that each flight in the flight schedule is operated by one crew 3 Subsequences 3.1 Definition of Subsequence The subsequences of a flight F are the set of subsequent flights S that can legally follow F in a ToD. Figure 1 (below) shows flight F arriving at an airport followed by a series of flights departing from the same airport. The flights that are subsequences of F are labelled as S 1 through to S 4.

F Night S 3 S 4 An Airport Time Minimum ground time S 1 S 2 Maximum ground time Subsequence of F Not a Subsequence of F Figure 1: The subsequences of a flight F. The amount of time required between two flights in a ToD is called the minimum ground time and is typically between 30 and 45 minutes. Any flight that leaves before this time has elapsed this cannot be subsequence of flight F. Similarly any flight that leaves the airport after a maximum ground time has elapsed, typically at least a day, cannot be a subsequence of flight F. 3.2 Classification of Subsequence Subsequences can belong to one of the following four classes: 1. Follow-on subsequence. The crew do not change aircraft and operate the subsequent flight. 2. Same-day subsequence. The crew change aircraft and operate the subsequent flight which leaves on the same day as the first flight arrives. 3. Overnight subsequence. The crew change aircraft and operate the subsequent flight which leaves on the day after the first flight arrives. 4. Paxing subsequence. The crew is paxing on the subsequent flight. 3.3 Subsequence Counts The subsequence count for a flight F is the number of different subsequent flights that can legally follow F in a ToD. Assuming that the flights i=1,..,m are ordered in increasing order of departure time then the subsequence count for a flight F, denoted SC(F), can be calculated using the following formula: 4) SC(F) = {t [a Fj = 1, a ij = 0 for F < i < t, a tj = 1], j = 1,,n} In Figure 1 the subsequence count of flight F is four. In a typical problem the subsequence counts for a flight range from between twenty and forty. Generally the busier the airport the more subsequences a flight will have. A flight is said to have unique subsequence if its subsequence count is equal to one. An integer solution to the ToD problem defines a unique subsequence for every flight. The subsequences defined by the optimal integer solution are called the optimal subsequences.

4 Subsequence Limitation 4.1 The Concept of Subsequence Limitation Subsequence Limitation involves solving the ToD problem while allowing only a reduced set of subsequences, called the Limited Subsequence Set (LSS), to be used rather than all of the legal subsequences. If the LSS contains all of the optimal subsequences then the cost of the solution obtained will be identical to that obtained when solving with all of the subsequences. Reducing the size of the LSS is beneficial and will result in faster solution times for reasons explained below. Consequently, the aim when selecting the LSS is to reduce subsequence counts for all flights by as much as possible while still retaining the optimal subsequences as part of the set. This is done by an a priori heuristic that places subsequences thought likely to be optimal into the LSS. 4.2 The Benefits of Subsequence Limitation There are three benefits of reducing the size of the LSS. First, reducing the number of subsequences reduces the number of variables (n) because any ToDs containing subsequences that are not in the LSS are not allowed to be part of the solution. This makes both pricing and column generation faster, and if the LSS is small enough it may even be possible to use a priori variable generation. Second, reducing the number of subsequences improves the integer properties of the LP-Relaxation. Berge (1972) demonstrated that if a matrix is balanced then all of its extreme points will be naturally integer. A zero-one matrix is balanced if and only if it contains no odd order two-cycle sub matrices (ie. an odd order sub matrix with all its row and column sums equal to two). Unfortunately the real world rarely produces balanced matrices. It is the prevalence of odd order cycles that cause factional solutions in set partitioning problems and make integrality difficult to obtain. However, if the subsequence counts for almost all flights are small then the number of odd order cycles will also be small. Fewer subsequences mean fewer odd order cycles, which in turn means fewer fractions, thus reducing the amount of branch and bound required to obtain integrality. Third, reducing the number of subsequences makes it easier to aggregate constraints. Constraints can be aggregated if the flights they represent must be operated together and cannot be operated individually in a ToD. Reducing the number of constraints reduces the cost of a simplex iteration, O(m 3 ) for a sparse Bartels-Golub method, which in turn leads to shorter solution times. The more flights that are limited to unique subsequence the more opportunities for constraint aggregation arise. To obtain individual dual variables for the aggregated constraints a dual disaggregation strategy proposed by Elhallaoui et al. (2004) can be used. 4.3 How to Select the Limited Subsequence Set There are two ways to select the LSS in an intelligent manner. The first is to use knowledge about the relative likelihoods of the four subsequence classes being present in the optimal solution. Typically around two thirds of the optimal subsequences are of the follow-on class. Since there is only one of these subsequences for each flight this makes an excellent starting point for a LSS. There is a strong objective pressure against long time gaps between same-day subsequences. The same is true for overnights but to a lesser extent. The likelihood of an overnight subsequence being used varies with the time of day: less likely in

the morning, much more likely late at night. Paxing subsequences are the hardest to predict. This is because there are generally many of them but only a few are used in the optimal solution and there is no real indication of which, if any, will be required. The second approach is to use previous solutions to make predictions about the likely solution to the current problem. Airlines solve the ToD problem on a regular basis, often weekly. Schedules on the other hand, tend to stay roughly constant for longer periods for customer convenience. Consequently, solutions to two similar schedules will often share common subsequences. 4.4 The Drawback to Subsequence Limitation Unfortunately, no a priori selection heuristic is perfect and eventually one or more optimal subsequences will be accidentally excluded from the LSS. The result of such an exclusion is that the optimal integer solution can no longer be obtained using the LSS and the resulting integer solution will have a higher objective value. The amount of objective degradation depends on the number and class of the subsequences that are excluded. Fortunately, ToD problems tend to have several possible integer solutions with similar objective values. For a typical problem excluding a small number of optimal subsequences has a negligible effect on the objective function. In our experiments we needed to randomly exclude, on average, 4.8% of the optimal subsequences to damage the LP objective by 0.1% (a fifth of the IP bound-gap in these problems). However, as the number of excluded subsequences grows the cost of the crew schedule increases as more expensive ToDs are required to cover the flights. If enough optimal subsequences are excluded the problem may eventually become infeasible. The fundamental drawback to Subsequence Limitation is that it is impossible to tell whether an integer solution is of good quality or not. The reason for this is that it is always possible that a slightly different LSS would have resulted in a much cheaper solution. In order to check the solution quality a comparison with a solution obtained when allowing all of the subsequences is required, but clearly would defeat the purpose of performing subsequence limitation. Consequently, the limitation heuristic needs to be liberal when selecting the LSS to ensure that all, or almost all, of the optimal subsequences are included. Typically a subsequence count of at least ten per flight is required to obtain good quality solutions. The downside to this is that the larger the LSS becomes, the less pronounced the benefits limiting subsequence are. 5 Subsequence Identification 5.1 The Concept of Subsequence Identification The aim of our Subsequence Identification technique is to exploit all of the benefits of limiting subsequence while avoiding the drawback of Subsequence Limitation outlined above. Subsequence Identification also involves solving the ToD problem using reduced set of subsequences. The key difference between the two methods is that for Subsequence Limitation this set is dynamic as opposed to the static LSS used in the Subsequence Limitation technique. This allows for the progressive correction of any mistakes made by the initial selection heuristic throughout the solution process. Consequently, possible exclusion of optimal subsequences by the initial selection heuristic is not the critical problem it is for Subsequence Limitation. This allows the employment of a far more aggressive limitation strategy. We refer to this as solving using a

Severely Limited Subsequence Set (SLSS). We want the SLSS to have subsequence counts of no more than three or four for all flights and ideally to have unique subsequence wherever we can be sure of our decision. 5.2 Modifying the Severely Limited Subsequence Set The SLSS can be modified by the addition or removal of subsequences. Decisions to modify the limited subsequence set are made by the subsequence selector. Any missing optimal subsequences are identified and added to the SLSS. Similarly, any non-optimal subsequences are identified and removed from the SLSS in order to keep the set small. The subsequence selector can add subsequences from a set called the Candidate Subsequence Set (CSS). The CSS is a superset of the SLSS and is also a dynamic set. The subsequence selector uses dual variable information to determine which, if any, subsequences should be added to the SLSS from the CSS. In a process similar to column generation the current dual vector is passed to a ToD generator which finds ToDs with negative reduced costs. However unlike column generation, the columns representing the negative reduced cost ToDs are not added to the LP. Instead the generated ToDs are analysed by the subsequence selector in order to make decisions about the SLSS. The subsequence selector gathers statistics about the ToDs it receives and these form the basis for its decisions. The statistics fall into three broad categories. First, the number of ToDs that contain a subsequence is counted. Second, the number of different dual vectors that produce ToDs containing a subsequence is counted. Third, the reduced costs of those ToDs and the contribution of individual subsequences to those reduced costs are analysed. The threshold for adding a subsequence is variable. It depends on the class of the subsequence being considered and on an a priori assessment of its likelihood of being an optimal subsequence. For instance, a higher qualification standard is required for paxing subsequences than for same-day subsequences. The current subsequence count for the flight is also taken into consideration. The more subsequences a flight has in the SLSS the less likely we are to add additional subsequences for that flight. Similar considerations apply to the removal of subsequences. If a ToD containing a subsequence is in the basis at positive value then the subsequence is in use, otherwise it is unused. The use of all subsequences in the SLSS is monitored on a regular basis. The less a subsequence is used the more likely it is to be removed. Especially if the subsequence count for the flight is high or the subsequence has been in the SLSS for many iterations. Adding a subsequence to the SLSS results in the addition of a set of columns to the LP. These columns represent all the ToDs that contain the subsequence to be added but with the restriction that all of their other subsequences must be members of the SLSS. Removing a subsequence from the SLSS results in the removal of all columns representing ToDs that contain that subsequence from the LP. 5.3 Assessing Solution Quality Once an integer solution has been found using Subsequence Identification the quality of that solution is assessed. This ensures that the SLSS contains all, or at least nearly all, of the optimal subsequences and that consequently the solution quality has not been harmed. A fractional solution provides a lower bound on the objective value of an integer solution and is used as a measure of solution quality. The CSS is dynamic and can be expanded to include all legal subsequences for all flights. If no negative reduced cost ToDs

can be generated from the expanded CSS using the dual variables from the LP solution then the integer solution obtained is optimal. If negative reduced cost ToDs are found it will be because they contain subsequences that were not part of SLSS. These subsequences can then be added to the SLSS and the LP can continue to solve. So long as the LP lower bound does not significantly decrease with the continued addition of subsequences then the integer solution is of good quality. If a significant objective decrease does occur after the addition of subsequences then branch and bound can be restarted from the new LP solution using the expanded SLSS. In most cases we do not expect the objective to improve significantly because the subsequence selector should ensure that the SLSS contains all of the optimal subsequences. 5.4 Dual Stabilisation Dual variable information is crucial because it determines the content of the ToDs examined by the subsequence selector and thus the quality of the SLSS. Unfortunately the dual variables become unstable whenever the SLSS is altered. This is problematic because alterations to the SLSS occur frequently when performing Subsequence Identification. To combat this problem we perform dual stabilisation. The following method is an adaptation of the stabilisation technique proposed by du Merle et al. (1999). T Τ T 5) min c x δ y + δ + y+ 6) s.t. Ax y + y+ = em 7) x, y, y + 0 Slack (y + ) and surplus (y - ) variables are added to every constraint in the model (6) and are bounded between zero and one (7). The costs on these variables + and - (5) provide an upper and lower bound respectively on the value of each dual variable. For a feasible solution all stabilisation variables must be at the value zero. Progressively increasing the costs of the stabilisation variables will force them from the basis. By adjusting the bounds throughout the solution process the variation of the dual variables can be controlled. Our experiments show that using dual stabilisation improves the performance of Subsequence Limitation by allowing missing optimal subsequences to be identified earlier. 6 Computational Results All computational tests are carried out on a desktop computer with an Intel 3 GHz Pentium IV processor, 768 megabytes of RAM, Microsoft Windows 2000 operating system, and with the programs written and compiled in Microsoft Visual C++ Version 6.0. The Global Pairings Optimiser was used in conjunction with the ZIP simplex solver, Ryan (1980). We compare our results, in Table 1 (below), using Subsequence Identification to a regular solution process across a selection of four different ToD problems taken from two different airlines. The relative sizes of the problems are indicated by the number of constraints. The composition of the initial SLSS was limited to follow-on subsequences only. All four problems show a reduction in the total amount of time required to solve the problem when using Subsequence Identification. The integer objective values are identical for two of the problems and are well within the 0.5% bound tolerance for the other two

problems. Small reductions in the percentage of fractional variables in the LP solutions are observed for all problems. A reduction in the number of branch and bound nodes required to obtain an integer solution were observed in three of the problems. Optimisation 1 2 3 4 Number of constraints 354 793 459 1341 Regular Solve Number of LP Iterations 48104 219768 17839 68957 LP Solution Time (s) 188 975 579 131 % of Basic Variables Fractional 16.5 35.8 2.6 22.6 Branch and Bound Nodes 191 398 287 462 Branch and Bound Time (s) 1 23 1 25 Integer Objective 1331.3 3044.2 5522.6 3924 Total Time (s) 199 1026 584 172 Subsequence Identification Number of LP Iterations 21505 106430 9514 28552 LP Solution Time (s) 87 482 309 56 % of Variables Fractional 14.2 30.7 2.6 18.2 Branch and Bound Nodes 178 369 287 435 Branch and Bound Time (s) 1 19 1 21 Integer Objective 1331.3 3045.5 5522.6 3923.6 Total Time (s) 98 531 314 93 Total Time Saved (s) 101 495 270 79 Table 1: Computational results for four problems comparing the Subsequence Identification technique to a regular solve. 7 Conclusions The results demonstrate that Subsequence Identification can be successfully used to reduce the solution time of the ToD problem without harming the quality of the solution. The majority of the time saving in these problems is obtained during the LP solution rather than the branch and bound. However, this is not particularly surprising given that these problems tended not to be very fractional and thus did not require much branch and bound to achieve integrality. The reductions in the percentage of fractions and branch and bound times are encouraging and we expect these reductions to become more pronounced on more fractional problems. A well chosen SLSS can greatly improve the performance Subsequence Identification. Our experiments show that the more optimal subsequences that are initially included the faster a solution can be obtained. We are continuing to develop a selection heuristic for seeding the SLSS and will be investigating how to use previous solutions to aid our predictions. We are investigating how to better integrate Subsequence Identification with the decisions made during branch and bound. We intend to implement and investigate the constraint aggregation technique propose by Elhallaoui et al. (2004). We also have a preliminary parallel implementation of the Subsequence Identification process

Acknowledgements I wish to thank my supervisor Prof. David M Ryan for all of his support and encouragement. I also wish to thank Optimal Decision Technologies (www.odt.biz), and in particular Dr. Paul Day, for use of the Global Pairings Optimiser software and for their assistance. References Arabeyre, A., Fearnley, J., Steiger, F., and W. Teather. 1969. The airline crew scheduling problem: a survey. Transportation Science 3. pp140-163. Baker, E. and M. Fisher. 1981. Computational results for very large air crew scheduling problems. Omega 9. pp. 613-618. Berge, C. 1972. Balanced Matrices. Mathematical Programming 2. pp. 19-31. Day, P. 1996. Flight Attendant Rostering for Short-haul Airline Operations. The University of Auckland. Elhallaoui, I., Villeneuve, D.,Soumis, F., and G. Desaulniers. 2004. Dynamic Aggregation of Set Partitioning in Column Generation. Les Cahiers du GERAD. pp. 5-25. Goldie, A. 1996. Optimal Airline Crew Scheduling Using Dynamic Column Generation. The University of Auckland. du Merle, O., Villeneuve, D,. Derosiers, J,. and P. Hansen. 1999. Stabilized Column Generation. Discrete Mathematics, 194. Ryan, D. ZIP A zero-one integer programming package for scheduling. 1980. Report C.S.S. 85, A.E.R.E. Harwell, Oxfordshire. Ryan, D. and J. Falkner On the Integer properties of Scheduling Set Partitioning Models. European Journal of Operations Research, 35 (1988). pp. 442-456. Shah, S. 2000. Improving the efficiency of the ToD Optimiser for very Large Flight Schedules. The University of Auckland. pp 12-35. Wren, A., Smith, B., and A. Miller. 1985. Complementary approaches to crew scheduling. Computer Scheduling of Public Transport, 2. pp. 263-278.