Sequece Aligmet: Liear Space Aligmet i liear space Chapter 7 of Joes ad Pevzer Q. Ca we avoid usig quadratic space? Easy. Optimal value i O(m + ) space ad O(m) time. Compute OPT(i, ) from OPT(i-1, ). No easy way to recover aligmet itself. Optimal logest commo subsequece i O(m + ) space ad O(m) time [Hirschberg (1975)]. Clever combiatio of divide-ad-coquer ad dyamic programmig. Applicatio to sequece aligmet: E.W. Myers ad W. Miller. Optimal aligmets i liear space. Computer Applicatios i Bioscieces, 4:11-17, 1988. Divide ad Coquer Algorithms Divide problem ito sub-problems Coquer by solvig sub-problems recursively. If the sub-problems are small eough, solve them i brute force fashio Combie the solutios of sub-problems ito a solutio of the origial problem Sortig Give: a usorted array Goal: sort it 1 2 2 3 4 5 6 7 Mergesort: Divide log() divisios to split a array of size ito sigle elemet arrays Mergesort: Coquer 2 5 4 7 1 3 2 6 2 4 5 7 1 2 3 6 1 2 2 3 4 5 6 7 log() iteratios, each iteratio takes O() time Total time: O( log ) O() O() O() 1
Mergesort: Merge Step Mergig 2 4 5 7 1 2 3 6 1 1 2 Mergesort: Example Divide 1 2 2 1 2 2 3 1 2 2 3 4 Etcetera 1 2 2 3 4 5 6 7 Coquer 4 20 6 7 1 3 5 9 4 6 7 20 1 3 5 9 2 sorted arrays of size ad m ca be merged i O(+m) time to form a sorted array of size +m 1 3 4 5 6 7 9 20 MergeSort Algorithm MergeSort(c) size of array c if = 1 retur c left list of first /2 elemets of c right list of last -/2 elemets of c sortedleft MergeSort(left) sortedright MergeSort(right) sortedlist Merge(sortedLeft,sortedRight) retur sortedlist MergeSort: Ruig Time i the ith iteratio we do O() work umber of iteratios is O(log ) ruig time: O( log ) Back to sequece aligmet The Problem: Computig Aligmet Path Requires Quadratic Memory Aligmet Path Space complexity for computig aligmet path for sequeces of legth ad m is O(m) We eed to keep all backtrackig refereces i memory to recostruct the path (backtrackig) m 2
Computig Aligmet Score usig Liear Memory Space complexity of computig just the score itself is O() Oly eed the previous colum to calculate the curret colum 2 Fidig the Middle Poit Ad Agai Ad Agai Crossig the Middle Lie Crossig the Middle Lie m/2 m (i, m/2) Defie: score(i) - the score of the optimal path from (0,0) to (,m) that passes through (i, m/2) m/2 m (i, m/2) (mid,m/2): the positio where the optimal path crosses the middle colum. mid = argmi 0 i score(i) 3
Crossig the Middle Lie score(i) = prefix(i) + suffix(i) prefix(i): score of the optimal aligmet of a legth m/2 prefix of y to a prefix of x (takes a path from (0,0) to (i,m/2) ) suffix(i): score of the optimal aligmet of a legth m/2 suffix of y to a suffix of x (takes a path from (i,m/2) to (,m) ) prefix(i) m/2 m suffix(i) Computig prefix(i) prefix(i): legth of the logest path from (0,0) to (i,m/2) Compute prefix(i) by dyamic programmig i the left half of the matrix store prefix(i) colum 0 m/2 m Computig suffix(i) suffix(i): score of optimal aligmet from (i,m/2) to (,m) Ca be computed by goig i reverse from (,m) to (i,m/2) Fidig the Middle Poit store suffix(i) colum 0 m/2 m Ad Agai Ad Agai 4
Time = Area: First Pass O first pass, the algorithm covers the etire area Time = Area: First Pass O first pass, the algorithm covers the etire area Area = m Area = m Computig prefix(i) Computig suffix(i) Time = Area: Secod Pass O secod pass, the algorithm covers oly 1/2 of the area Time = Area: Third Pass O third pass, oly 1/4th is covered. Area/2 Area/4 Geometric Reductio At Each Iteratio 1 + ½ + ¼ +... + (½) k 2 Rutime: O(Area) = O(m) first pass: 1 3 rd pass: 1/4 2 d pass: 1/2 5 th pass: 1/16 4 th pass: 1/8 Ru Time Aalysis Let T(m, ) = max ruig time of algorithm o strigs of legth m ad. O(m) time to compute prefix(, m/2) ad "suffix(, m/2) ad fid midpoit q. T(q, m/2) + T( - q, m/2) time for two recursive calls. Choose costat c so that: T(m, 2)! 2cm T(2, )! 2c T(m, )! cm + T(q, m / 2) + T( " q, m / 2) Claim: T(m, ) <= 2cm (proof by iductio) 30 5
Is it Possible to Alig Sequeces i Subquadratic Time? Dyamic programmig takes O( 2 ) for various aligmet methods Ca we do better? Yes: The Four-Russias Speedup (works for LCS but ot for geeral sequece aligmet problem) O( 2 /log ) 6