Data Mining the Online Encyclopedia of Integer Sequences for New Identities Hieu Nguyen

Similar documents
Crunching Numbers to Match Integer Sequences Hieu Nguyen

Combinatorics. Chapter Permutations. Reading questions. Counting Problems. Counting Technique: The Product Rule

8. Combinatorial Structures

1. How many possible ways are there to form five-letter words using only the letters A H? How many such words consist of five distinct letters?

H2 Mathematics Pure Mathematics Section A Comprehensive Checklist of Concepts and Skills by Mr Wee Wen Shih. Visit: wenshih.wordpress.

THE LUCAS TRIANGLE RECOUNTED. Arthur T. Benjamin Dept. of Mathematics, Harvey Mudd College, Claremont, CA Introduction

Ch 9 Sequences, Series, and Probability

}, how many different strings of length n 1 exist? }, how many different strings of length n 2 exist that contain at least one a 1

Summary of Random Variable Concepts April 19, 2000

Counting on r-fibonacci Numbers

Permutation Enumeration

Fingerprint Classification Based on Directional Image Constructed Using Wavelet Transform Domains

X-Bar and S-Squared Charts

7. Counting Measure. Definitions and Basic Properties

COS 126 Atomic Theory of Matter

CS3203 #5. 6/9/04 Janak J Parekh

x y z HD(x, y) + HD(y, z) HD(x, z)

A generalization of Eulerian numbers via rook placements

Procedia - Social and Behavioral Sciences 128 ( 2014 ) EPC-TKS 2013

Chapter 3 Digital Logic Structures

COMBINATORICS 2. Recall, in the previous lesson, we looked at Taxicabs machines, which always took the shortest path home

Making sure metrics are meaningful

Roberto s Notes on Infinite Series Chapter 1: Series Section 2. Infinite series

Name Class. Date Section. Test Form A Chapter Chapter 9 Infinite Series. 1 n 1 2 n 3n 1, n 1, 2, 3, Find the fourth term of the sequence

PERMUTATIONS AND COMBINATIONS

Alignment in linear space

CHAPTER 5 A NEAR-LOSSLESS RUN-LENGTH CODER

On the Number of Permutations on n Objects with. greatest cycle length

Intermediate Information Structures

x 1 + x x n n = x 1 x 2 + x x n n = x 2 x 3 + x x n n = x 3 x 5 + x x n = x n

Shuffling Cards. D.J.W. Telkamp. Utrecht University Mathematics Bachelor s Thesis. Supervised by Dr. K. Dajani

信號與系統 Signals and Systems

On Parity based Divide and Conquer Recursive Functions

Western Number Theory Problems, 17 & 19 Dec 2016

信號與系統 Signals and Systems

Novel pseudo random number generation using variant logic framework

Tehrani N Journal of Scientific and Engineering Research, 2018, 5(7):1-7

ELEC 204 Digital Systems Design

Logarithms APPENDIX IV. 265 Appendix

Counting and Probability CMSC 250

Novel Steganography System using Lucas Sequence

Radar emitter recognition method based on AdaBoost and decision tree Tang Xiaojing1, a, Chen Weigao1 and Zhu Weigang1 1

PHY-MAC dialogue with Multi-Packet Reception

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 12

A SELECTIVE POINTER FORWARDING STRATEGY FOR LOCATION TRACKING IN PERSONAL COMMUNICATION SYSTEMS

Combinatorics and probability

A study on the efficient compression algorithm of the voice/data integrated multiplexer

AMC AMS AMR ACS ACR ASR MSR MCR MCS CRS

GENERALIZED FORM OF A 4X4 STRONGLY MAGIC SQUARE

Test Time Minimization for Hybrid BIST with Test Pattern Broadcasting

Sorting, Selection, and Routing on the Array with Reconfigurable Optical Buses

SHORT-TERM TRAVEL TIME PREDICTION USING A NEURAL NETWORK

PROJECT #2 GENERIC ROBOT SIMULATOR

Application of Improved Genetic Algorithm to Two-side Assembly Line Balancing

AMC AMS AMR ACS ACR ASR MSR MCR MCS CRS

We often find the probability of an event by counting the number of elements in a simple sample space.

Spread Spectrum Signal for Digital Communications

EECE 301 Signals & Systems Prof. Mark Fowler

Hybrid BIST Optimization for Core-based Systems with Test Pattern Broadcasting

Voice Command Recognition System Based on MFCC and VQ Algorithms

Design of FPGA- Based SPWM Single Phase Full-Bridge Inverter

Compression Programs. Compression Outline. Multimedia. Lossless vs. Lossy. Encoding/Decoding. Analysis of Algorithms

4. INTERSYMBOL INTERFERENCE

HOW BAD RECEIVER COORDINATES CAN AFFECT GPS TIMING

A New Design of Log-Periodic Dipole Array (LPDA) Antenna

sible number of wavelengths. The wave~~ngt~ ~ ~ ~ c ~ n b~dwidth is set low eno~gh to interfax One of the most im

A New Energy Efficient Data Gathering Approach in Wireless Sensor Networks

Density Slicing Reference Manual

ASample of an XML stream is:

ECONOMIC LOT SCHEDULING

WAVE-BASED TRANSIENT ANALYSIS USING BLOCK NEWTON-JACOBI

arxiv: v2 [math.co] 15 Oct 2018

CS 201: Adversary arguments. This handout presents two lower bounds for selection problems using adversary arguments ëknu73,

POWERS OF 3RD ORDER MAGIC SQUARES

Cross-Entropy-Based Sign-Selection Algorithms for Peak-to-Average Power Ratio Reduction of OFDM Systems

An Adaptive Image Denoising Method based on Thresholding

AC : USING ELLIPTIC INTEGRALS AND FUNCTIONS TO STUDY LARGE-AMPLITUDE OSCILLATIONS OF A PENDULUM

BOTTLENECK BRANCH MARKING FOR NOISE CONSOLIDATION

E X P E R I M E N T 13

Towards Acceleration of Deep Convolutional Neural Networks using Stochastic Computing

A study on traffic accident measures in municipal roads by using GIS

Encode Decode Sample Quantize [ ] [ ]

PERMUTATION AND COMBINATION

Color gamut of SOCS and its comparison to Pointer's gamut

Wavelet Transform. CSEP 590 Data Compression Autumn Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

A RULE OF THUMB FOR RIFFLE SHUFFLING

Design of FPGA Based SPWM Single Phase Inverter

AkinwaJe, A.T., IbharaJu, F.T. and Arogundade, 0.1'. Department of Computer Sciences University of Agriculture, Abeokuta, Nigeria

[MT93a, ALS94, Nie95, MNR95]. All these algorithms exploit kow characterizatios of extesios of default theories i terms of sets of geeratig defaults,

Using Color Histograms to Recognize People in Real Time Visual Surveillance

Introduction to Markov Models

Optimal Arrangement of Buoys Observable by Means of Radar

Counting III. Today we ll briefly review some facts you dervied in recitation on Friday and then turn to some applications of counting.

INCREASE OF STRAIN GAGE OUTPUT VOLTAGE SIGNALS ACCURACY USING VIRTUAL INSTRUMENT WITH HARMONIC EXCITATION

3. Error Correcting Codes

(2) The MOSFET. Review of. Learning Outcome. (Metal-Oxide-Semiconductor Field Effect Transistor) 2.0) Field Effect Transistor (FET)

Measurements of the Communications Environment in Medium Voltage Power Distribution Lines for Wide-Band Power Line Communications

General Model :Algorithms in the Real World. Applications. Block Codes

lecture notes September 2, Sequential Choice

Acquisition of GPS Software Receiver Using Split-Radix FFT

Transcription:

Slide 1 of 18 Data Miig the Olie Ecyclopedia of Iteger Sequeces for New Idetities Hieu Nguye Rowa Uiversity MAA-NJ Sectio Sprig Meetig March 31, 2012

2 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Ackowledgemets Doug Taggart (Udergraduate Research Assistat) Slide 2 of 18

MAA-NJ Sprig Meetig Data Miig OEIS.b 3 ü Olie Ecyclopedia of Iteger Sequeces (OEIS) 1. Searchable olie database - http://oeis.org 2. Cotais over 200,000 iteger sequeces Slide 3 of 18 OEIS Deluge 3. Created by Neil Sloae (AT & T Bell Labs), curretly maitaied by OEIS Foudatio 4. Example: F = 0, 1, 1, 2, 3, 5, 8, 13, 21,...

4 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 4 of 18 Miig the OEIS ü Data Miig (Large Scale Patter Recogitio) Process of extractig patters from large data sets usig computer sciece, mathematics, ad statistics. ü Mie OEIS for Iteger Sequece Idetities 1. Elarge OEIS database to iclude trasformatios of iteger sequeces 2. Fid matches betwee sequece trasformatios (experimetal cojectures) 3. Prove experimetal cojectures that are iterestig to obtai ew idetities 4. GOAL: Discover iterestig coectios betwee differet areas of mathematics

MAA-NJ Sprig Meetig Data Miig OEIS.b 5 Slide 5 of 18 Experimetal Patter Matchig ü Example 1 ü A000045: Fiboacci sequece; FHL = FH - 1L + FH - 2L, FH0L = 0, FH1L = 1 FHL = 0, 1, 1, 2, 3, 5, 8, 13, 21,, 39088169 (39 terms); ³ 0 1. A000045S1T3: Sums of Squares Trasformatio GHL = k=0 FHL 2 = 0, 1, 2, 6, 15, 40, 104,, 2472169789339634; 0 2. A000045S1T8: Product of Cosecutive Terms Trasformatio HHL = FHLÿFH + 1L = 0, 1, 2, 6, 15, 40, 104,, 2472169789339634; 0 3. Idetical Match: GHL = HHL EXPERIMENTAL CONJECTURE: k=0 F 2 k = F ÿf +1

6 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Example 2 Slide 6 of 18 ü A000295: Euleria umbers (umber of permutatios of {1,2,...,} with exactly oe descet). ahl = 0, 0, 1, 4, 11, 26, 57, 120, 247, 502,..., 8589934558; ³ 0 (34 terms) 1. A000295S1T9: Cassii Trasformatio: GHL = ah + 1L ah - 1L - a HL 2 = 0, -1, -5, -17,, -3489660929 ü A031878: Maximal umber of edges i Hamiltoia path i complete graph o odes. bhl = 0, 1, 3, 5, 10, 13, 21, 25, 36,..., 1378 ³ 1 (53 terms) 2. A031878S1T4: Biomial Trasform of bhl (pad bh0l = 0L: HHL = k=0 H-1L k bhkl = 0, 0, 1, 0, -1, -5, -17,, -3489660929,..., -55169095435288577 k 3. Partial Match: GHL º HH + 3L EXPERIMENTAL CONJECTURE: ahl 2 - ah + 1L ah - 1L = - +2 k=0 H-1L +2 + 2 k b k = H - 1L 2 + 1 H 1L

MAA-NJ Sprig Meetig Data Miig OEIS.b 7 Slide 7 of 18 Hutig for Idetities ü Classical Approach Tools: Paper ad pecil, good book-keepig Great bookkeepers: Joh Wallis, Isaac Newto, Leoard Euler ü Moder Approach Tools: Computers, computer algebra systems (e.g. Maple, Mathematica, Matlab, Sage) Small-scale: Search for idetities oe at a time usig OEIS Large-scale: Mie for clusters of idetities (EUREKA) Patter Matchig Algorithm for Iteger Sequeces ahl bhml T 1 HaH k LL T 2 HbHm k LL Compute distace d betwee T 1 HaH k LL ad T 2 HbHm k LL If d d max, match foud: T 1 HaH k LL=T 2 HbH k LL If d > d max, match ot foud

8 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 8 of 18 Database of Sequece Trasformatios ü Raw Source Data - Sequeces 8a < from OEIS ü Set of Trasformatios LABEL TRANSFORMATION FORMULA T1 Idetity ahl T2 Partial Sums k=0 ahkl T3 Partial Sums of Squares k=0 a HkL 2 T4 Biomial Trasform k=0 H-1L k K k O ahkl T5 Self - Covolutio k=0 ahkl ah - kl T6 Liear Weighted Partial Sums k=1 k ahkl T7 Biomial Weighted Partial Sums k=0 K k O ahkl T8 Product of Cosecutive Elemets a HL a H + 1L T9 Cassii a H - 1L a H + 1L - a HL 2 T10 First Stirlig k=0 sh, kl ahkl T11 Secod Stirlig k=0 SH, kl ahkl

MAA-NJ Sprig Meetig Data Miig OEIS.b 9 ü Create MySQL Database of Sequece Trasformatios Slide 9 of 18 ID Label Subsequece Trasformatio Positio Etry1 Etry2 Etry3 1 A000045S1T1 1 1 0 0 1 1 2 A000045S1T1 1 1 1 1 1 2 3 A000045S1T1 1 1 2 1 2 3 4 A000045S1T1 1 1 3 2 3 5........................ 38 A000045S1T1 1 1 37 24 157 817 39 088 169 Null 39 A000045S1T1 1 1 38 39 088 169 Null Null 1. Cotais over 77 millio rows (each row stores a widow of 3 terms of a sequece) - 5 GB file 2. Cotais extremely large umbers (up to 100 digits log) 3. Idexed to perform fast searches

10 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 10 of 18 Matchig Iteger Sequeces ü Mai Assumptio: Perfect data set - o errors i the terms of each iteger sequece ü Challeges 1. Sequeces vary i legth (4 to 100 terms) 2. High proportio of sequeces begi with 0 s ad 1 s. 3. Fid a effective similarity measure (i.e. distace fuctio) to miimize false matches. ü Overlappig Ru 1. {1, 1, 2, 3, 5, 8, 13, 21, 47, 55} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} NO MATCH (Worst) 2. {55, 89, 144, 233, 377, 610} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} MATCH 3. {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} MATCH 4. {2, 3, 5, 8, 13, 21, 34, 55} 8aHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34} MATCH (Best?)

MAA-NJ Sprig Meetig Data Miig OEIS.b 11 Slide 11 of 18 Head Bites Tail Overlap ü What qualifies as a match betwee two fiite sequeces? Head Tail :ah1l, ah2l,..., ahn - 1L, ahnl> :bh1l, bh2l,..., bhm - 1L, bhml> Head Tail We will say that two sequeces likely match or are similar (i the sese that there is a chace that both fiite sequeces are part of the same ifiite sequece) if the head (begiig) of oe sequece bites (overlaps with) the tail (ed) of the other sequece. ü Head-Bites-Tails Overlap We say that two fiite sequeces cotai a head-bites-tail (HBT) overlap if there is a overlappig ru which starts at the begiig of oe sequece ad stops at the ed of either sequece. CASE 1: ah1l,ah2l,... ah 0 L,...,aHNL CASE 2: bh1l,..., bhll,...bhml ah1l,ah2l,... ah 0 L,...,aH 0 +M-1L,...,aHNL bh1l,..., bhml

12 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 12 of 18 HBT Distace ü DEFINITION We defie L max to be the maximum HBT overlap, i.e. the legth of the logest HBT overlap, betwee 8a HL< 1 N ad 8bHL< 1 M. If o HTB overlap exists, the we set L max = 0. ü DEFINITION We defie the head-bites-tail (HBT) distace d betwee 8a HL< 1 N ad 8bHL< 1 M to be where L max is the maximum HBT overlap betwee ahl ad bhl. d := dhahl, bhll = N + M - 2 L max Ituitio: d ca also be thought of as specifyig the umber of remaiig elemets i ahl ad bhl that DO NOT overlap. ü Examples 1. 8aHL< = {55, 89, 144, 233, 377, 610} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} d = 6 + 10-2 H1L = 14 2. 8aHL< = {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} d = 11 + 10-2 H7L = 7

MAA-NJ Sprig Meetig Data Miig OEIS.b 13 Slide 13 of 18 Relative HBT Distace ü DEFINITION We defie the relative HBT distace d r betwee 8a HL< 1 N ad 8bHL< 1 M to be NOTE: 0 d r 1 ü Examples 1. 8aHL< = {55, 89, 144, 233, 377, 610} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} 6+10-2 H1L d r = = 14 = 7 6+10 16 8 2. 8aHL< = {3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377} 8bHL< = {1, 1, 2, 3, 5, 8, 13, 21, 34, 55} 11+10-2 H7L d r = = 7 = 1 11+10 21 3 d r := d r HaHL, bhll = d = N+M-2 L = 1-2 L N+M N+M N+M

14 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Implemetatio Slide 14 of 18 EUREKA Project i. Mathematica - geerate sequece trasformatios ad perform patter matchig (d r 1ê2, L max 4) ii. MySQL - store sequece trasformatios ad matches to a database ü Scope i. First 170,000 sequeces i OEIS (A000001-A170000) ii. Over oe millio sequece trasformatios (T1-T11) ü Search Results i. Over 300,000 matches foud so far ii. Prelimiary aalysis shows: - Most matches are trivial or already metioed i OEIS (> 99%) - Small fractio of false positives (> 0.9%)

MAA-NJ Sprig Meetig Data Miig OEIS.b 15 Slide 15 of 18 Three Experimetal Cojectures ü EUREKA Database Website 1. 1563: A000129S1T3 = A041011S1T8 2. 2010: A000240S1T7 = A006882S1T8 3. 2443: A000295S1T9 = A031878S1T4

16 MAA-NJ Sprig Meetig Data Miig OEIS.b ü Curret Status - Eureka Database cotais more iteger sequeces tha OEIS but ot as smart ü Scale up processig power ad memory - Perform search o a cluster of computers ª - Implemet parallel/distributed computig (Liux cluster) ü Improve sequece matchig algorithms - Reduce search-times ª - Reduce trivial matches ad false positives ü Expad Scope of Search - Elarge collectio of sequece trasformatios ª - Compositios of sequece trasformatios Slide 16 of 18 Next Steps - Exted search to 2-D sequeces (e.g. Pascal s triagle) ad ratioal sequeces (e.g. Beroulli umbers)

MAA-NJ Sprig Meetig Data Miig OEIS.b 17 Slide 17 of 18 ü Dissemiate Work - Create database website ª - Make database website accessible to the public - Publish ew iterestig (o-trivial) proofs of experimetal cojectures ü Seek Help - Need good programmers (recruit studets! ª ) - Need collaborators (faculty ad studets) to aalyze ad prove experimetal cojectures (suitable as studet research projects)

18 MAA-NJ Sprig Meetig Data Miig OEIS.b Slide 18 of 18 The Ed