Slide 1 of 18 Data Miig the Olie Ecyclopedia of Iteger Sequeces for New Idetities Hieu Nguye Rowa Uiversity MAA-NJ Sectio Sprig Meetig March 31, 2012
2 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Ackowledgemets Doug Taggart (Udergraduate Research Assistat) Slide 2 of 18
MAA-NJ Sprig Meetig Data Miig OEIS.b 3 ü Olie Ecyclopedia of Iteger Sequeces (OEIS) 1. Searchable olie database - http://oeis.org 2. Cotais over 200,000 iteger sequeces Slide 3 of 18 OEIS Deluge 3. Created by Neil Sloae (AT & T Bell Labs), curretly maitaied by OEIS Foudatio 4. Example: F = 0, 1, 1, 2, 3, 5, 8, 13, 21,...
4 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 4 of 18 Miig the OEIS ü Data Miig (Large Scale Patter Recogitio) Process of extractig patters from large data sets usig computer sciece, mathematics, ad statistics. ü Mie OEIS for Iteger Sequece Idetities 1. Elarge OEIS database to iclude trasformatios of iteger sequeces 2. Fid matches betwee sequece trasformatios (experimetal cojectures) 3. Prove experimetal cojectures that are iterestig to obtai ew idetities 4. GOAL: Discover iterestig coectios betwee differet areas of mathematics
MAA-NJ Sprig Meetig Data Miig OEIS.b 5 Slide 5 of 18 Experimetal Patter Matchig ü Example 1 ü A000045: Fiboacci sequece; FHL = FH - 1L + FH - 2L, FH0L = 0, FH1L = 1 FHL = 0, 1, 1, 2, 3, 5, 8, 13, 21,, 39088169 (39 terms); ³ 0 1. A000045S1T3: Sums of Squares Trasformatio GHL = k=0 FHL 2 = 0, 1, 2, 6, 15, 40, 104,, 2472169789339634; 0 2. A000045S1T8: Product of Cosecutive Terms Trasformatio HHL = FHLÿFH + 1L = 0, 1, 2, 6, 15, 40, 104,, 2472169789339634; 0 3. Idetical Match: GHL = HHL EXPERIMENTAL CONJECTURE: k=0 F 2 k = F ÿf +1
6 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Example 2 Slide 6 of 18 ü A000295: Euleria umbers (umber of permutatios of {1,2,...,} with exactly oe descet). ahl = 0, 0, 1, 4, 11, 26, 57, 120, 247, 502,..., 8589934558; ³ 0 (34 terms) 1. A000295S1T9: Cassii Trasformatio: GHL = ah + 1L ah - 1L - a HL 2 = 0, -1, -5, -17,, -3489660929 ü A031878: Maximal umber of edges i Hamiltoia path i complete graph o odes. bhl = 0, 1, 3, 5, 10, 13, 21, 25, 36,..., 1378 ³ 1 (53 terms) 2. A031878S1T4: Biomial Trasform of bhl (pad bh0l = 0L: HHL = k=0 H-1L k bhkl = 0, 0, 1, 0, -1, -5, -17,, -3489660929,..., -55169095435288577 k 3. Partial Match: GHL º HH + 3L EXPERIMENTAL CONJECTURE: ahl 2 - ah + 1L ah - 1L = - +2 k=0 H-1L +2 + 2 k b k = H - 1L 2 + 1 H 1L
MAA-NJ Sprig Meetig Data Miig OEIS.b 7 Slide 7 of 18 Hutig for Idetities ü Classical Approach Tools: Paper ad pecil, good book-keepig Great bookkeepers: Joh Wallis, Isaac Newto, Leoard Euler ü Moder Approach Tools: Computers, computer algebra systems (e.g. Maple, Mathematica, Matlab, Sage) Small-scale: Search for idetities oe at a time usig OEIS Large-scale: Mie for clusters of idetities (EUREKA) Patter Matchig Algorithm for Iteger Sequeces ahl bhml T 1 HaH k LL T 2 HbHm k LL Compute distace d betwee T 1 HaH k LL ad T 2 HbHm k LL If d d max, match foud: T 1 HaH k LL=T 2 HbH k LL If d > d max, match ot foud
8 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 8 of 18 Database of Sequece Trasformatios ü Raw Source Data - Sequeces 8a < from OEIS ü Set of Trasformatios LABEL TRANSFORMATION FORMULA T1 Idetity ahl T2 Partial Sums k=0 ahkl T3 Partial Sums of Squares k=0 a HkL 2 T4 Biomial Trasform k=0 H-1L k K k O ahkl T5 Self - Covolutio k=0 ahkl ah - kl T6 Liear Weighted Partial Sums k=1 k ahkl T7 Biomial Weighted Partial Sums k=0 K k O ahkl T8 Product of Cosecutive Elemets a HL a H + 1L T9 Cassii a H - 1L a H + 1L - a HL 2 T10 First Stirlig k=0 sh, kl ahkl T11 Secod Stirlig k=0 SH, kl ahkl
MAA-NJ Sprig Meetig Data Miig OEIS.b 9 ü Create MySQL Database of Sequece Trasformatios Slide 9 of 18 ID Label Subsequece Trasformatio Positio Etry1 Etry2 Etry3 1 A000045S1T1 1 1 0 0 1 1 2 A000045S1T1 1 1 1 1 1 2 3 A000045S1T1 1 1 2 1 2 3 4 A000045S1T1 1 1 3 2 3 5........................ 38 A000045S1T1 1 1 37 24 157 817 39 088 169 Null 39 A000045S1T1 1 1 38 39 088 169 Null Null 1. Cotais over 77 millio rows (each row stores a widow of 3 terms of a sequece) - 5 GB file 2. Cotais extremely large umbers (up to 100 digits log) 3. Idexed to perform fast searches
10 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 10 of 18 Matchig Iteger Sequeces ü Mai Assumptio: Perfect data set - o errors i the terms of each iteger sequece ü Challeges 1. Sequeces vary i legth (4 to 100 terms) 2. High proportio of sequeces begi with 0 s ad 1 s. 3. Fid a effective similarity measure (i.e. distace fuctio) to miimize false matches. ü Overlappig Ru 1. {1, 1, 2, 3, 5, 8, 13, 21, 47, 55} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} NO MATCH (Worst) 2. {55, 89, 144, 233, 377, 610} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} MATCH 3. {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} MATCH 4. {2, 3, 5, 8, 13, 21, 34, 55} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34} MATCH (Best?)
MAA-NJ Sprig Meetig Data Miig OEIS.b 11 Slide 11 of 18 Head Bites Tail Overlap ü What qualifies as a match betwee two fiite sequeces? Head Tail :ah1l, ah2l,..., ahn - 1L, ahnl> :bh1l, bh2l,..., bhm - 1L, bhml> Head Tail We will say that two sequeces likely match or are similar (i the sese that there is a chace that both fiite sequeces are part of the same ifiite sequece) if the head (begiig) of oe sequece bites (overlaps with) the tail (ed) of the other sequece. ü Head-Bites-Tails Overlap We say that two fiite sequeces cotai a head-bites-tail (HBT) overlap if there is a overlappig ru which starts at the begiig of oe sequece ad stops at the ed of either sequece. CASE 1: ah1l,ah2l,... ah 0 L,...,aHNL CASE 2: bh1l,..., bhll,...bhml ah1l,ah2l,... ah 0 L,...,aH 0 +M-1L,...,aHNL bh1l,..., bhml
12 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 12 of 18 HBT Distace ü DEFINITION We defie L max to be the maximum HBT overlap, i.e. the legth of the logest HBT overlap, betwee 8a HL< 1 N ad 8bHL< 1 M. If o HTB overlap exists, the we set L max = 0. ü DEFINITION We defie the head-bites-tail (HBT) distace d betwee 8a HL< 1 N ad 8bHL< 1 M to be where L max is the maximum HBT overlap betwee ahl ad bhl. d := dhahl, bhll = N + M - 2 L max Ituitio: d ca also be thought of as specifyig the umber of remaiig elemets i ahl ad bhl that DO NOT overlap. ü Examples 1. 8aHL< = {55, 89, 144, 233, 377, 610} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} d = 6 + 10-2 H1L = 14 2. 8aHL< = {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} d = 11 + 10-2 H7L = 7
MAA-NJ Sprig Meetig Data Miig OEIS.b 13 Slide 13 of 18 Relative HBT Distace ü DEFINITION We defie the relative HBT distace d r betwee 8a HL< 1 N ad 8bHL< 1 M to be NOTE: 0 d r 1 ü Examples 1. 8aHL< = {55, 89, 144, 233, 377, 610} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} 6+10-2 H1L d r = = 14 = 7 6+10 16 8 2. 8aHL< = {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} 11+10-2 H7L d r = = 7 = 1 11+10 21 3 d r := d r HaHL, bhll = d = N+M-2 L = 1-2 L N+M N+M N+M
14 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Implemetatio Slide 14 of 18 EUREKA Project i. Mathematica - geerate sequece trasformatios ad perform patter matchig (d r 1ê2, L max 4) ii. MySQL - store sequece trasformatios ad matches to a database ü Scope i. First 170,000 sequeces i OEIS (A000001-A170000) ii. Over oe millio sequece trasformatios (T1-T11) ü Search Results i. Over 300,000 matches foud so far ii. Prelimiary aalysis shows: - Most matches are trivial or already metioed i OEIS (> 99%) - Small fractio of false positives (> 0.9%)
MAA-NJ Sprig Meetig Data Miig OEIS.b 15 Slide 15 of 18 Three Experimetal Cojectures ü EUREKA Database Website 1. 1563: A000129S1T3 = A041011S1T8 2. 2010: A000240S1T7 = A006882S1T8 3. 2443: A000295S1T9 = A031878S1T4
16 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Curret Status - Eureka Database cotais more iteger sequeces tha OEIS but ot as smart ü Scale up processig power ad memory - Perform search o a cluster of computers ª - Implemet parallel/distributed computig (Liux cluster) ü Improve sequece matchig algorithms - Reduce search-times ª - Reduce trivial matches ad false positives ü Expad Scope of Search - Elarge collectio of sequece trasformatios ª - Compositios of sequece trasformatios Slide 16 of 18 Next Steps - Exted search to 2-D sequeces (e.g. Pascal s triagle) ad ratioal sequeces (e.g. Beroulli umbers)
MAA-NJ Sprig Meetig Data Miig OEIS.b 17 Slide 17 of 18 ü Dissemiate Work - Create database website ª - Make database website accessible to the public - Publish ew iterestig (o-trivial) proofs of experimetal cojectures ü Seek Help - Need good programmers (recruit studets! ª ) - Need collaborators (faculty ad studets) to aalyze ad prove experimetal cojectures (suitable as studet research projects)
18 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 18 of 18 The Ed