International Journal of Arts and Commerce Vol. 3 No. 8 October, 2014 A study of patent numbers forecasting by linear regression on cloud storage technology Liu, Kuotsan Associate Professor Graduate Institute of Patent National Taiwan University of Science and Technology Chen, Yingching Graduate student of master s degree Graduate Institute of Patent National Taiwan University of Science and Technology Abstract A patent numbers forecasting by linear regression is presented in this paper. A popular and short lifecycle software technology, sharing link on cloud storage, was selected to demonstrate the research and results on the main patentee diagram and technology-function matrix. The result shows that the linear model based on numbers of inventors has high coefficient of determination. For a research and development proposal of a company, how many patents should file could be easily determined by the forecasting patentees diagram and the forecasting technology-function matrix. Keyword:patent map, patent analysis, patent forecasting, cloud storage. 1. Introduction Patentsare powerful to stop competitorsenter claimed scopesbased on their exclusive rights.a company owns a big amount of patents is normal in modern industry. Famous companies,for example,international Business Machines Corporation (IBM) and MicrosoftCorporation(Microsoft), each owns more than 100 thousands patents. To accumulate sufficient number of patents and occupy a higher rank ofmain patentees in special technical field is important to get a large marketshare. It is necessary to make patent analysis before a research and development (R&D) project to guarantee no 207
International Journal of Arts and Commerce ISSN 1929-7106 www.ijac.org.uk block by competitors patents. However, this work is difficult and complex because of millions of patents in database. There are 2.35 millions of patent publications in 2012, 9.2% growth on 2011, all over the world by statistics of World Intellectual Property Office. Patent analysis itself has become a professional and research field because its complexity and difficulty. In patent analysis, patent maps are useful tools to visualize the distribution of patents, monitor the trend of technological changes, infer the strategy of patent portfolios, and compare competitors by statistical charts or diagrams. Patent maps shows macroscopic view of patents, and offer a company to determine the direction of R&D. For example, a main patentees diagram shows the main competitors and their patent numbers, a technology-function matrixshows the patent density on technical problems and solutions. All patent maps give the views of patent on the drawing date, but the R&D objective setup is on a couple of years later. A forecasting patent map is more helpful to determine the budget and the intended number of patent applications for the R&D proposal. This paper focus on patent numbers forecasting, a linear regression model was employed to do this work. A popular software technology, sharing link on cloud storage, was selected to demonstrate the research and results on the main patentee diagram and technology-function matrix. Cloud storage has become the spotlight on IT industry. Huge amount of information is keeping challenging the load of computer system, companies put their data in cloud storages instead of keep in hand, files sharing link is necessary for them. IT industry is special in its short life time, a new company may accumulate enough patents and become a main patentee in short term. Patent numbers forecasting would be more important and usefulin this technology. 2. Methodology and data 2.1 Patent pool by search queries The technical topicemployed for this study is sharing link in cloud storage. Search queries on US patent publication database were in the following(search date: Dec.26,2013): S1= (shar* adj3 (link* or URL or URI or hyperlink*)).dsc. 12,154 hits S2=S1 not(vehicle* or GPS or sensor*).dsc. 8,975 hits S3=S2 and (707* or 709*).UCM. 1,996 hits S4=S3 not (adverti*).dsc. 1,423 hits Where USPC(United States Patent Classification)707 is data processing: database, data mining, and file management or data structures, UPC709 is multicomputer data transferring. Both are the most important classesin cloud storage technology. 208
International Journal of Arts and Commerce Vol. 3 No. 8 October, 2014 2.2 An overview of main patentees and time evolution of patent publications Table 1 is the top 20 patentees in the technical scope under the search query S4. Microsoft Corporation and IBMoccupy top two in software technology, and own more than 50% of total patents. Google, Yahoo, Facebook are famous companies in the world, but large gap behind top two. The rightest column shows percentages of patent numbers after 2006 comparing to total. It shows that 11 patentees entered this technical field after 2006, so we further limited the pool after 2006, which is the starting year to form technology-function matrices in this paper. Table 1 patent numbers of top 20 patentees rank Total After New Percentage 2006 rank 1 Microsoft Corporation 121 82 1 67.8% 2 International Business Machines Corporation 119 51 2 42.9% 3 Cisco Technology, Inc. 25 15 5 60.0% 4 Google Inc. 17 16 4 94.1% 5 PatentVC Ltd. 16 16 4 100.0% 6 Yahoo! Inc. 14 12 8 85.7% 7 Salesforce.com, Inc. 13 13 6 100.0% 9 Nortel Networks Limited 12 12 8 100.0% 9 Fujitsu Limited 12 7 16 58.3% 11 ACCENTURE GLOBAL SERVICES LIMITED 10 7 16 70.0% 11 AOL Inc. 10 10 9 100.0% 13 Sprint Communications Company L.P. 9 9 10 100.0% 13 Juniper Networks, Inc. 9 8 12 88.9% 14 NetApp, Inc. 8 8 12 100.0% 17 NEC CORPORATION 7 5 19 71.4% 17 Nokia Corporation 7 7 16 100.0% 17 SWsoft Holdings, Ltd. 7 7 16 100.0% 19 FACEBOOK, INC. 5 5 19 100.0% 19 Actifio 5 5 19 100.0% 20 DROPBOX, INC. 3 3 20 100.0% Fig 1 is time evolution of patent publications, from 2006 to 2013. Light bubbles are patent numbers of data processing,dark bubbles are multicomputer data transferring. Both are near linear increasing, multicomputer data transferring has a higher positive slope. 209
International Journal of Arts and Comme merce ISSN 1929-7106 www.ijac.org.uk The numbers of patentt publications p from 2006 to 2012 119 140 120 100 88 76 69 72 69 75 57 80 60 40 45 41 2006 2007 63 59 57 20 44 48 51 2011 2012 0 2005 2008 2009 2010 2013 2014 Fig.1 Time evolution of patent publica cations 2.3 Linear regression model i was taken to make patent numbers fore recasting in this paper. The linear model based on numbers off inventors yi=α xi + β (1) ion on year i, xi is numbers of inventor on year ar i, α and β are two whereyiis numbers of patent publication coefficients of linear model. two segmentson 2013 3. Technology-function matrices for tw Fig.2 is a technology function matrix ix at 2013for data processing. We employed US SPC subclasses to be technologies, or technical solutions on n x-axis, including: database and file access, dat atabase design, file or database maintenance, collaborative doc ocument database and workflow, data integrity,, aand file management. Fig.3 is a technology function matrixx at 2013 for multicomputer data transferring,, iits USPC subclasses including:distributed data processing, computer c conferencing, multicomputer data tra transferring via shared memory, remote data accessing, network n computer configuring, computer network managing, computer-to-computer(c-to-c) session/co connection establishing, c-to-c protocol implem lementing, c-to-c data routing. 210
International Journal of Arts and Commerce Vol. 3 No. 8 October, 2014 Fig.2 A technology-function matrix for data processing on 2013 We took 10% samples randomly, to get the problems of technological development after manual reading. The problems on y-axis for both matrices are improved efficiency, providing a flexible system, simplify operations, enhance security, tracking or monitor, enhanced system consistency and reliability. The search query of each problem could be organized at the same time. For example, the query of improved efficiency is (potim* or efficien* or effective* or (improve* near3(performance* or congest*)) or accelera*).dsc. We got 653 patent publications under this query, and got the numbers of publications for USPC subclasses or nodes on matrix. One publication may drop in more than one subclass. 211
International Journal of Arts and Commerce ISSN 1929-7106 www.ijac.org.uk Fig.3 A technology-function matrix for multicomputer data transferring on 2013 In order to check the reliability of the matrix, we took 10% samples randomly again for each query, and read samples manually to check whether their technical problems are consistent with the query goal. The correct percentages for each problem are 75.4%, 78.4%, 70.6%,57.8%, 70.7%,66.0%. The total quality fall in 62.18% to 77.46% for 95% confidence interval.on the average, 70% quality may be not excellent, but 90% labor cost saving is an important merit. Fig.2 shows that in data processing, database and file access has become the popular tehchnology for each technical problems, collaborative document database and workflow has not yet developed. Fig.3 shows that in multicomputer data transferring, computer network managing was the popular technology.if patent numbers in database and file access or computer network managing are not detail enough, we could employ UPC lower level subclasses to spread x-axis and get publication numbers on each nodes quickly under this method. The technology-function matrix shows patent numbers on all problem-solution nodes clearly. If one company determines the topics of R&D under the matrix, the next question would be how many patent applications should file to become a main patentee? It needs make patent numbers forecasting to answer this question. 212
International Journal of Arts and Commerce Vol. 3 No. 8 October, 2014 4. Patent numbers forecasting by linear regressions 4.1 Linear regressions of three patentees Patent numbers forecasting for every patentee can be got by using the linear regression model of formula (1). Three main patentees, Microsoft, IBM, and Yahoo were selected to demonstrate the results and check their reliability. Fig.4 is the linear regression diagram onmicrosoft. We put the numbers of inventors and patent publications from 1999 to 2013 in formula (1) to get α=0.2898, β= -0.3939. In this linear model, the coefficient of determination R 2 =0.9342, R square indicates how well data points fit a statistical model, 0 R 2 1, the higher value of R square, the stronger explanation of the linear model. The R square value shows a high reliability. Sample Predictive Value Fig.4 Linear regression diagram on patent numbers of Microsoft Fig.5 is the linear regression diagram of IBM. The coefficients in formula (1) are determined by the same method,α=0.3612, β=-0.3478 for Microsoft. In this linear model, the coefficient of determination R 2 =0.9133, it also shows a high reliability. 213
International Journal of Arts and Commerce ISSN 1929-7106 www.ijac.org.uk Sample Predictive Value Fig.5 Linear regression diagram on patent numbers of IBM Fig. 6 is the linear regression diagram of Yahoo. The coefficients in formula (1) are α=0.3512, β=0.1045. In this linear model, the coefficient of determination R 2 =0.9451, which is high reliability again. Sample Predictive Value Fig. 6 Linear regression diagram on patent numbers of Yahoo 4.2 Patent numbers forecasting in the next three years The patent numbers forecasting for thirteen patentees in the next three years can be easily calculated after the coefficients had been determined. We regarded the average number of inventors in the latest five years as x i to 214
International Journal of Arts and Comme merce Vol. 3 No. 8 October, 2014 get the number of patents yi. Fig.7 is pate atent publication forecasting in 2014-2016, and to total patents in 2016. In Fig.7, there would be 8.82,9.05,8,8.04 publications in the next 3 years by Micro crosoft, 6.80,6.65,7.16 publications by IBM. It shows that thee top two patentees are difficult to surpass. However, the others are not difficult lt to catch up,the third patentee will accumulatee aapproximately20, the fourth patentee, less than 20. If one com ompany intends to become a top ten patentee in 20 2016, he should file at least 9 applications in the next three years. ye If one wishes to become a top five paten tentee, it will need 18 applications. The R&D budget can be determined d under the number of patent applicatio tions. One patentee can get a higher rank by the forecasting diag agram. Estimated Applicants ants distribution dist of U.S. publications in 2016 2006~2013 Publication blications 0 20 40 estimated 2015 60 82 Microsoft Corporation 51 IBM Google 16 Cisco 15 salesforce.com 13 Yahoo 12 80 estimated imated 2016 2 100 8.82 9.05 8.04 6.80 6.65 7.16 10 AOL Inc. Sprint Comm. 9 Juniper Networks 8 ACCENTURE GLOBAL 7 NEC CORPORATION 5 FACEBOOK 5 DROPBOX estimated 2014 3 Fig.7 patent publication forecasting in 2014-2016 4.3Patent numbers forecasting on inte terested technology and function A forecast technology function matrix ix can be formed by using the linear model of form ormula (1). Fig. 8 is a forecasting matrix, three interested node des were selected to illustrate the results. 215
International Journal of Arts and Commerce ISSN 1929-7106 www.ijac.org.uk Fig.8 A forecast technology-function matrix In Fig.8, the first node is database and file access to solve improved efficiency problems. On this node, there will be 19, 20, 19 patents in the next three years, and reach 212 on 2016. The second node is computer conferencing on tracking or monitor, this node has a higher increasing rate, will reach 126 on 2016. The third node is computer network managing on enhance security, this node has a lower increasing rate, 66 patents on 2016. The other nodes are not difficult to forecast by the same model. The forecasting matrices year by year visualize not only patent densities but also patent increasing rates. A quick growing up bubble indicates hot topic of technology and function. 5. Conclusions Patent numbers forecasting is important for research and development. A simple but high reliable linear regression method was illustrated in this paper. It is very helpful to determine how many patent applications should file in a couple of years, and further determine the R&D budget. A linear regression model based on the numbers of inventors on main patentees predicts the new ranking in the next years. The linear regression model applies to the nodes of technology-function matrix to get a forecast matrix in the future. Some low density nodes in this year may grow up to high density in the next years. The forecast matrix can low down the risk of R&D comparing to rely only on a present matrix. 216
International Journal of Arts and Commerce Vol. 3 No. 8 October, 2014 The IT industry is characterized in its short lifetime cycle. Both famous and unknown companies usually appear in the main patentee diagram. It don t need a long term to become a main patentee even for a nameless company. Patent numbers forecasting are more powerful and necessary in this technical field. Acknowledgement This study is conducted under the Cloud computing systems and software development project (3/3) of the Institute for Information Industry which is subsidized by the Ministry of Economy Affairs of the Republic of China. References Ernst, Holger(2003),Patent imformation for strategic technology management,world Patent Information, 25, 233-242. Vouk, Mladen A.(2008), Cloud Computing-Issues, Research and Implementations, Journal of Computing and Information Technology, CIT 16, pp.235-246. Trappeya, Charles V., Hsin-Ying Wua, FatanehTaghaboni-Duttab, Amy J.C. Trappeyc(2011), Using patent data for technology forecasting: China RFID patent analysis, Advanced Engineering Informatics, 25(1), pp.53-64. Xiea, Zhongquan, KumikoMiyazahia(2013), Evaluating the effectiveness of keyword search strategy for patent identification, World Patent Information, 35(1), pp.20-30. Liu, Kuotsan, Yen, Yunxi(2013), A quick approach to get a technology-function matrix for an interested technical topic of patents. International Journal of Arts and Commerce, 2(6),pp.85-96. Liu, Kuotsan, Lin, Hanting(2014), A study on the relationship between technical development and fundamental patents based on US granted patents, European International Journal of Science and Technology. 2(7),pp.314-327. 217