A. Linden, J. Fenn Strategic Analysis Report 30 May 2003 Hype Cycle for Advanced Analytics, 2003 Analytics is a vast space with broad applicability in many different business areas. To assess the maturity of any given analytics technology, it is necessary to look at specific applications. Management Summary Gartner defines analytics as the technology area that applies mathematical transformations to data and previous insights about all kinds of processes to produce new insights. Such insights can be analytical or predictive and may support decisions as well as automation. Analytics is a very large and fragmented space. From a methodological perspective, its roots and many of its branches, such as statistics, operations research, pattern recognition, optimization and decision theory, are in mathematics (see "Analytics From 2003 to 2012," COM-18-9337). Analytics also encompasses many interdisciplinary schools, such as data mining, simulation, artificial intelligence, information retrieval and computational linguistics. Many of the more advanced analytics technologies, such as genetic algorithms, Bayesian approaches and fuzzy logic, while often hyped by the press, have only taken hold in niche markets. Technologies such as neural nets, data mining and mathematical programming have already matured in some areas, but in others they still lack traction and therefore it has been difficult to place these technologies precisely on the Hype Cycle. The reason is that analytical technologies touch on nearly every empirical or scientific field. Most importantly, in the business domain, analytics has a strong foothold in customer relationship management, business intelligence, business activity monitoring, risk management, quality monitoring, computer-aided design and computeraided manufacturing, pharmaceutics, forensics and demand prediction. This Hype Cycle shows the more advanced analytical technologies, their impact on business and anticipated adoption. Gartner Reproduction of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.
30 May 2003 2
CONTENTS 1.0 The Hype Cycle...5 2.0 On the Rise...5 2.1 Information Extraction...5 2.2 Swarm Intelligence...6 2.3 Autonomous Systems...6 2.4 Video Mining...6 3.0 At the Peak...6 3.1 Audio Mining...6 3.2 Semantic Web...7 4.0 Sliding Into the Trough...7 4.1 Intelligent Agents...7 5.0 Climbing the Slope...8 5.1 Artificial Intelligence...8 5.2 Personalization (General)...8 5.3 Three-Dimensional Visualization...8 5.4 Genetic Algorithms...9 5.5 Automated Text Categorization...9 6.0 Entering the Plateau...9 6.1 Mathematical Programming...9 6.2 Data Mining...9 6.3 Neural Nets...10 7.0 Conclusion...10 Appendix A:Hype Cycle Definitions...11 30 May 2003 3
FIGURES Figure 1. Hype Cycle for Advanced Analytics, 2003...5 30 May 2003 4
1.0 The Hype Cycle Visibility Key: Time to Plateau Semantic Web Audio Mining Less than two years Twotofiveyears Five to 10 years More than 10 years Video Mining Autonomous Systems Swarm Intelligence Information Extraction Personalization (General) Mathematical Programming Data Mining Neural Nets Automated Text Categorization Genetic Algorithms Three-Dimensional Visualization Artificial Intelligence Intelligent Agents As of May 2003 Technology Trigger Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity Maturity Source: Gartner Research (May 2003) Figure 1. Hype Cycle for Advanced Analytics, 2003 2.0 On the Rise 2.1 Information Extraction Definition: Information extraction culls concepts such as names, geographical entities and relationships from unstructured data (mostly text). Time to Plateau/Adoption Speed: Five to 10 years. Justification for Hype Cycle Position/Adoption Speed: Information extraction technologies have been offered commercially only recently. Most vendors are startups with limited traction, mainly in government and the life sciences. Business Impact Areas: Information access, semantic Web, scientific literature and competitive intelligence. Selected Vendors: ClearForest, Inxight Software, SRA International, Mohomine (recently acquired by Kofax), Microlanguage, Temis and IBM. 30 May 2003 5
2.2 Swarm Intelligence Definition: Swarm intelligence (also called emergent computation) produces complex behavior from the interaction of many simple behaviors, as observed in societies of ants and bees. Uses include simulation and planning. Justification for Hype Cycle Position/Adoption Speed: Mostly still at the research stage, with some limited commercial experimentation for simulation. Business Impact Areas: Simulation and modeling. Analysis by Jackie Fenn and Alexander Linden 2.3 Autonomous Systems Definition: Systems that solve problems or robustly maintain a level of functionality even under difficult situations such as environmental drift, damage or attacks while limiting human intervention. Typically based on rules and statistical methods that have been developed for artificial intelligence. Justification for Hype Cycle Position/Adoption Speed: Although IBM (with Autonomic Computing), Sun Microsystems and Microsoft will be able to improve autonomous systems during the next few years, the overall issue is complex. Certain questions will not be resolved in the short term, for example, how should human experts interact with these autonomous systems? Business Impact Areas: More stable and robust system infrastructure. Decreased costs for systems maintenance over the long term. Selected Vendors: Computer Associates International (CA), IBM, Microsoft and Sun Microsystems. 2.4 Video Mining Definition: Applies data mining operations such as filtering, clustering, categorization, pattern matching and conceptual search to video streams, without necessarily requiring human indexing of the content. Justification for Hype Cycle Position/Adoption Speed: Its use beyond niche applications will depend on broader usage and storage of video information within enterprises. Business Impact Areas: Media industry, surveillance and education. Selected Vendors: Convera, IBM, Virage and DataCrystal. Analysis by Jackie Fenn and Alexander Linden 3.0 At the Peak 3.1 Audio Mining Definition: Applies data mining operations such as filtering, clustering, categorization, pattern matching and conceptual search to audio streams, without necessarily requiring human indexing of the content. 30 May 2003 6
Time to Plateau/Adoption Speed: Five to 10 years. Justification for Hype Cycle Position/Adoption Speed: Commercial products are available primarily for audio search and retrieval, based on a transcription of the audio stream. The level of accuracy in the transcript renders it of marginal use for analytical purposes. Early interest is being shown from call centers, and failure diagnostics teams in the manufacturing industry. Business Impact Areas: Media industry, call centers, diagnostics, education and surveillance. Selected Vendors: Utopy, Comverse, Eyretel (now part of Witness Systems) and Nice Systems. Analysis by Jackie Fenn and Alexander Linden 3.2 Semantic Web Definition: Extends the Web through semantic markup languages, such as the Resource Description Framework, OWL the Web Ontology Language and Topic Maps (see "Hype Cycle for XML, 2003," R-19-9727) that describe entities and their relationships (see "Innovative Approaches for Improving Information Supply," M-14-3517). Time to Plateau/Adoption Speed: Five to 10 years. Justification for Hype Cycle Position/Adoption Speed: So far, there is little deployment of the semantic Web and there is a significant skill shortage. Business Impact Areas: Information access, systems interoperability, database integration and data quality. Selected Vendors: Network Inference, intelligent views, empolis, ontoprise, Mondeca and Ontopia. 4.0 Sliding Into the Trough 4.1 Intelligent Agents Definition: Software that exhibits a large degree of autonomy, decentralized authority and robustness in dynamically changing environments. Includes autonomous systems (see Section 2.3). Justification for Hype Cycle Position/Adoption Speed: Intelligent agents are not a single technology, nor should they be desirable objectives in themselves. Increasingly, general software capability will become more autonomous, robust and possibly decentralized, but it is uncertain whether these entities will ever be called "intelligent agents" we have had similar naming problems several decades ago, as with "intelligent terminals." Business Impact Areas: Network management, personalization, user interfaces in the long term, most IT applications. Analysis by Jackie Fenn and Alexander Linden 30 May 2003 7
5.0 Climbing the Slope 5.1 Artificial Intelligence Definition: Artificial intelligence is a branch of computer science, overlapping with cognitive science. It studies problems such as vision, speech recognition, perception, reasoning, planning and prediction, and the implementation of these capabilities in computer systems. Justification for Hype Cycle Position/Adoption Speed: Artificial intelligence has been hyped continually, but has fallen into the trough again. Some subcategories such as data mining, speech recognition and rules engines have found mature deployment already. Others have found only niche deployment (for example, reasoning, vision and planning). This field is far from implementing its original ambitions, but Gartner is very positive about the medium- to long-term contribution to business of artificial intelligence. Business Impact Areas: Diagnostics, data mining, optimization, simulation, knowledge management, customer relationship management and prediction. 5.2 Personalization (General) Definition: Personalization gears a system's activities it can be a Web site, a call center or the entire enterprise toward a user's specific information needs and preferences. Time to Plateau/Adoption Speed: Five to 10 years. Justification for Hype Cycle Position/Adoption Speed: To some extent, many applications contain aspects of personalization, but the concept has not lived up to expectations and is still overrated. The difficulty is in guessing a person's information needs at a particular time, from just a few historic observations. For entire consumer segments, this is something that works quite well statistically (marketing campaigns), but on an individual level it is far too inaccurate. Only very simple schemes have been successful, as with Amazon.com. For example, a person who buys a particular book online will then be offered books of a similar genre that other customers have bought. Business Impact Areas: Improved information access. 5.3 Three-Dimensional Visualization Definition: Three-dimensional (3-D) visualization uses interactive graphics to represent and manipulate high-volume, multidimensional data as graphical objects. It has a wide range of characteristics such as x, y or z axes, size, color, shape, vibration and movement. Justification for Hype Cycle Position/Adoption Speed: Sophisticated tools are available for 3-D visualization of complex data. Issues relate to whether the use of this more complex interface adds sufficient value to the user beyond simpler, two-dimensional visualizations. Business Impact Areas: Business intelligence, engineering and scientific discovery. Analysis by Jackie Fenn and Alexander Linden 30 May 2003 8
5.4 Genetic Algorithms Definition: Class of stochastic optimization methods that uses principles found in natural genetic reproduction (crossover or mutations of DNA structures). Justification for Hype Cycle Position/Adoption Speed: A poorly understood technology that, typically, is adopted by only a few aficionados after they become expert users. Nevertheless, it has been reported a success story in application areas such as scheduling. Business Impact Areas: Combinatorial problems such as those studied in operations research. 5.5 Automated Text Categorization Definition: The process of using statistical models or hand-coded rules to rate a document's relevancy to a certain subject (see "Technology Update: Automatic Text Categorization," T-18-2059). Time to Plateau/Adoption Speed: Two to five years. Justification for Hype Cycle Position/Adoption Speed: The automated text categorization market is evolving rapidly. Large portal, search and content management vendors are integrating this technology into their suites as a basic component. Business Impact Areas: Information access, alerts, question answering and overviews. Selected Vendors: Autonomy Corporation, Inxight Software and Verity. 6.0 Entering the Plateau 6.1 Mathematical Programming Definition: A framework of problem-solving techniques, which searches for variables that optimize equalities or inequalities. Some derivatives of this general framework constrain the types of variable, for example, integer programming. The simplest case is linear programming. Time to Plateau/Adoption Speed: Two to five years. Justification for Hype Cycle Position/Adoption Speed: In many domains, this technology has already reached maturity. But there are many more application areas such as call-center scheduling and supply chain optimization, where it is underused. This will improve slowly as data becomes better and there is less operational confusion. Business Impact Areas: Optimization. Selected Vendors: ILOG, The MathWorks, SPSS and SAS Institute. 6.2 Data Mining Definition: Data mining transforms raw data into higher-level constructs, such as predictive models, explanatory models, filters or summaries, by using algorithms from fields such as artificial intelligence and 30 May 2003 9
statistics. Techniques used can range from very simple models, such as arithmetic averages, those of intermediate complexity for example, linear regression, clustering, decision trees, case-based reasoning and k-nearest neighbor, to very complicated models including neural networks and Bayesian networks. Time to Plateau/Adoption Speed: Two to five years. Justification for Hype Cycle Position/Adoption Speed: Many application areas of data mining, such as campaign management, have matured already. Variations such as audio and video mining have been triggered just recently through the availability of faster computing platforms. Business Impact Areas: Optimization, diagnostics, marketing campaigns, pattern recognition and human behavior prediction. Selected Vendors: IBM, SAS, SPSS, CA and Fair Isaac. 6.3 Neural Nets Definition: Classes of mathematical or statistical models that allow sets of data to be clustered or predictive and nonlinear regression models to be constructed. Time to Plateau/Adoption Speed: Less than two years. Justification for Hype Cycle Position/Adoption Speed: Neural nets have been used in many different application areas. There are many alternatives and derivatives, and generally it is still not well understood how to select between the different approaches. Business Impact Areas: Optimization, diagnostics, marketing campaigns, pattern recognition and consumer behavior prediction. Selected Vendors: IBM, SAS, SPSS, Knowledge Extraction Engines (Kxen), CA and Fair Isaac. 7.0 Conclusion When deploying analytical technologies, users should understand how well the mathematical properties of the problem match the chosen approach. Case studies, pilots and validation tests in the field are essential to avoid risky undertakings. 30 May 2003 10
Appendix A: Hype Cycle Definitions Technology Trigger: A breakthrough, public demonstration, product launch or other event generates significant press and industry interest. Peak of Inflated Expectations: During this phase of overenthusiasm and unrealistic projections, a flurry of well-publicized activity by technology leaders results in some successes, but more failures, as the technology is pushed to its limits. The only enterprises making money are conference organizers and magazine publishers. Trough of Disillusionment: Because the technology does not live up to its overinflated expectations, it rapidly becomes unfashionable. Media interest wanes, except for a few cautionary tales. Slope of Enlightenment: Focused experimentation and solid hard work by an increasingly diverse range of organizations lead to a true understanding of the technology's applicability, risks and benefits. Commercial, off-the-shelf methodologies and tools ease the development process. Plateau of Productivity: The real-world benefits of the technology are demonstrated and accepted. Tools and methodologies are increasingly stable as they enter their second and third generations. The final height of the plateau varies according to whether the technology is broadly applicable or benefits only a niche market. Approximately 30 percent of the technology's target audience has, or is adopting, the technology as it enters the Plateau. Time to Plateau/Adoption Speed: The time required for the technology to reach the Plateau of Productivity. 30 May 2003 11