Hiding in Plain Sight: Pattern Based Analysis and What Your Current Data Are Trying to Tell You 23 April 2015 Donald A. Donahue, DHEd, MBA, FACHE Associate Professor, University of Maryland University College Managing Partner, Diogenec Group, LLP The Legal Stuff This continuing education activity is managed and accredited by National Association of Managed Care Physicians (NAMCP). Neither NAMCP nor UMUC support or endorse any product or service mentioned in this presentation. I have no financial interest to disclose. Commercial Support was not received for this activity. 2 1
Agenda The Evolving Environment Big Data The Cloud Opening the floodgates New Computational Power Standard Approaches, New Demands, New Capabilities The Healthcare Data Challenge Pattern Based Analytics Health Applications 3 Executive Summary This session will identify emerging analytical capabilities and their application, both to new technologies and for use with existing data. During this session, case studies of identifying cost outliers and root causes for adverse outcomes are presented. Attendees will gain an understanding of advances in analytics and their application to current operations. 4 2
Learning Objectives Following this session, attendees will: Be familiar with emerging concepts in data collection, processing, and analysis Understand emerging concepts in data collection, processing, analysis, and the applicability of novel data technologies Recognize ways to enhance processes, reduce costs, and improve quality 5 Framing the Environment Introductions 1946 19,000 2007 6 3
It s All About Big Data What is big data? It s everywhere: Hx, Sx, Dx, Tx, Rx, procedure codes, billing, sensors used to gather climate information, social media posts, digital pictures and videos, census charts, lab results, purchase transaction records, genome mapping, and mobile phone GPS signals Spans 4 dimensions: Volume: Enterprises are awash with ever growing data of all types, easily amassing terabytes even petabytes of information. Velocity: Sometimes 2 minutes is too late. Variety: Big data is any type of data structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. Veracity: 1 in 3 business leaders don t trust the information they use to make decisions. Source: http://www 01.ibm.com/software/data/bigdata/ 7 And Big Data in Healthcare Everybody is talking about it In Healthcare: Structured data for analytics Average length of stay = 3 days approx. 120 patients per occupied bed per year Each patient record could be as much as ~ 10,000 characters A 1,000 bed facility = Data size ~ 1.2 GB per year The vast amount of data created as much as 80 percent is unstructured (text, voice annotations, images) Structured data SIZE for individual providers is not a major problem in this context The key challenge is data sourcing, data extract, data consolidation, data cleaning, and data transformation Source: http://www 01.ibm.com/software/data/bigdata/ 8 4
And The Cloud 9 Cloud Computing Originally, all data and processes resided on a central platform (the mainframe) Advances in processing, speed, size, and storage drove functions to individual platforms These same continuing developments eventually overwhelmed the local platforms Cloud computing characteristics (NIST): On demand self service. Broad network access. Resource pooling. Rapid elasticity. Measured service. 10 5
Big Data and the Cloud Cloud computing has opened another door to conducting data science on a large scale. With cloud computing, initial costs are minimised, the scaling of capacity is flexible, and access is more open and widespread. These characteristics look likely to generate an explosion of new understanding. * Absent a robust analytical mechanism, the cloud is simply fog at a higher elevation. *http://www.newscientist.com/cloudup/article/in426 11 Big Data Challenges Along with the many opportunities, data intensive science will also bring complex challenges. Many scientists are concerned that the data deluge will make it increasingly difficult to find data of relevance and to understand the context of shared data. Today, it is difficult for a person even to track and maintain their own health records. Now envision the magnitude, diversity and dispersed nature of data generated by life sciences, genetics and bioinformatics over the next 10 years. *http://www.newscientist.com/cloudup/article/in426 12 6
Drowning in Data The data deluge refers to the situation where the sheer volume of new data being generated is overwhelming the capacity of institutions to manage it and researchers to make use of it. Source: President s Council of Advisors on Science and Technology, Leadership Under Challenge: Information Technology R&D in a Competitive World An Assessment of the Federal Networking and Information Technology R&D Program 35 (Aug. 2007) 13 Elementary, my dear Watson Evolving clinical decision support system February 2013: first commercial application utilization management decisions in lung cancer treatment Memorial Sloan Kettering Cancer and WellPoint 90 IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Hardware cost in the $ millions. Widespread availability when? 14 7
In the meantime We generate tremendous amounts of data 90% of all the world s data has been generated over the last two years [Total 3.2 zettabytes (3.2 10 21 )] In 5 years, the amount of digital information is expected to grow to 40 zettabytes That s 40,000,000,000,000,000,000,000 bytes The Challenge How to effectively use the data? How do we make sense of data to help drive our decisions? Data Analysis Zone 16 8
Analytic Approaches Who s driving? General purpose solutions (tools/platforms) Excel, SAS, SPSS, Cognos, Tableau, QLikView New offerings emerges on a regular basis Data mining Business Intelligence (BI) Statistics/Advanced Analytics Prediction/Forecasting Visual Exploration Specialized solutions Predictive applications Forecasting Scheduling Process optimization Both have relevance in healthcare 17 The Goal is Prescriptive Use Prescriptive Predictive Descriptive REPORTING WHAT happened? ANALYZING WHY did it happen? PREDICTING WHAT WILL happen? ACTING MAKE IT happen! 18 9
For Example Descriptive Patient population management Clinical quality and efficacy Outliers for providers/patients Coding errors and fraud Predictive Revenues in 30 and 90 days Readmissions (patient with CHF will be readmitted to the hospital in 30 or 90 days) Patient groups for risk adjustment Prescriptive Patient flow management Accurate costing Asset management Norm 250 200 Cost Forecast 150 Outliers 100 Median Average 50 0 2011 2012 2013 1 2 3 4 5 6 7 8 9 10 11 12 19 A Common Shortcoming It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle (Sherlock Holmes, A Scandal in Bohemia) 25 June 1891 20 10
4/13/2015 The Answer is often hiding in plain sight 21 Two Challenges of Big Data Challenges of Big Data Knowing Where To Look How to discover the regions in the data that are most information-rich to identify targets of interest? The Curse of Dimensionality Exponential growth increases dimensionality of the data Humans have difficulty visualizing, organizing, and analyzing data beyond 3 or 4 dimensions (data attributes) 11
Overcoming the Big Data Challenges Cognitive Science and Patterns Pattern Based Analytics applies Shannon s Information Theory, combined with robust and scalable Machine Learning methods, to solve these two problems by Finding the most information-rich, yet lower dimensionality, regions in the data that are characterized by patterns Why Patterns? Patterns are means for organizing large volumes of data Patterns are shorthand for identifying complex, meaningful relationships involving multiple variables They provide information on the underlying entity The human brain functions in large part by identifying patterns The decision-making processes of a human being are somewhat related to the recognition of patterns; for example, the next move in a chess game is based upon the present patterns on the board, and buying or selling stocks is decided by a complex pattern of information. The goal of pattern recognition is to clarify these complicated mechanisms of decision-making processes and to automate these functions. Source: Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, San Diego, CA: Academic Press, p. 1 12
Cognitive Science on Patterns The ability to recognize patterns in the environment is critical for an organism s survival. (Sinha, 2002). The ability to spot existing or emerging patterns is one of the most (if not the most) critical skills in intelligent decision making, though we re mostly unaware that we do it all the time. (Miemis, 2010) Pattern recognition is the fundamental human cognition or intelligence. (Pi et al., 2008) 25 Searching for Patterns We do this for fun, but not always for critical decision making. 26 13
Pattern Based Analytics Expanding a Natural Function Pattern Based Analytics amplify human capability by Automatically identifying patterns considering all dimensions Reducing the complexity of understanding the data Structuring results in prioritized, readily comprehensible graphic representation Pattern Based Analytics Utility of Patterns Pattern Based Analytics identify patterns To understand what has happened, and why To predict what might happen, understand why To explore alternative actions in the context of possible future scenarios 14
Discovery Pattern Based Analytics Pattern Based Analytics point to what is meaningful in a large, complex data set Pattern Based Analytics is a hypothesis generator ; it discovers the hypotheses that are not initially apparent o Encompassing all data points, including remote outliers o Eliminating process bias o Rapidly recognizing and prioritizing millions of patterns Case Study Identify poor providers Compare costs versus satisfaction Collect and prepare data Identify outlier data with exploratory analysis Discover key patterns automatically with a discovery engine to understand outlier behavior why the poor performance? Drill into the patterns to identify possible solutions maybe too many unnecessary procedures for poor performance, or expensive discharge dispositions? Take action and monitor outlier behavior over time and assess impact of changes 30 15
Example Data Characteristics Data drawn from de identified medical records Source data contain ~4 million records, collected over a period of 5 years from ~100 health care providers Data descriptors (183 attributes) Person specific information such as gender, age, ethnicity, etc. Encounter information such as : Provider ID Multiple diagnoses and codes Multiple procedures and codes Length stay, total costs, disposition, medical coverage type, etc. Patient satisfaction quality indicator Focus on cardiovascular disease 31 A Typical Management Tool Hard to gain insight from this table! 32 16
But Viewed via Patterns Ranking attributes against cost Quality has highest correlation Quality measured by Patient Satisfaction Level 1 (lowest) to 5 (highest) Then examine the (Quality, Cost) relationship 33 Graphic Representation Illustrates Areas of Concern Low frequency, high impact outliers High cost outliers (blue) identified with low quality 34 17
Delve into Contributing Factors 48 patients 98 patients Examining patterns for high cost, lowest quality Top Pattern: Number of Procedures = 10 or more Second Pattern: 2 4 procedures under Medicare 35 Further Exploration: The Who Patterns start to emerge. 36 18
Zooming In Top 4 providers (out of 111) account for 46% of poor outcomes within this pattern What are the key procedures involved in these outcomes? 37 Looking for Contributing Factors Percutaneous transluminal coronary angioplasty (PTCA) is the dominant principal procedure! How about secondary procedures? 38 19
Contributing Factors Dominant secondary procedure Ins drug elut coronary stent Next: Examine patient data to gain insight on additional procedure 39 Contributing Factors Patient Data for more specific sub population may suggest options... 40 20
Interesting Findings Analysis of high cost patients in top pattern Average cost for 65 yrs+ PTCA females: $421K Average cost for 65 yrs+ PTCA males: $212K Proactively monitor this sub population at high cost providers as they enter the system to potentially reduce costs and increase patient satisfaction 41 Secondary Findings Post discharge Costs: Home Health Service: $499,862.00 Skilled Nursing/Intermediate Care within admitting hospital : $282,779.50 57% cost saving in house 42 21
Summary of Example Data slicing and dicing did not reveal any insights Exploration identified an unusual and non optimal (Quality, Cost) relationship for Quality Level 1 Pattern Based Discovery identified two dominant outlier patterns that explain this relationship Drilling down into the top pattern provided a view into: the dominant providers, principal and secondary procedures the underlying data that provide insight into the role of additional procedures and other factors Monitor 65yrs+ females undergoing PTCA at high cost providers to reduce costs and improve satisfaction What if analysis on Discharge Disposition suggested an option of transferring patients to SN/IC facilities to reduce costs 43 Advanced Applications Precision Medicine Understanding Efficacy of Treatment What molecules in patient cells are important to specific treatment of a disease? What combination and concentration of molecules are important for Patient triaging as to efficacy of treatment Additional understanding of the disease mechanism for further improvement in drug development Dataset: Patient molecular data related to treatment of inflammatory disease as supplied by a biotech startup working with Big Pharma Results: Patterns detected in 10 seconds with 88% accuracy Traditional analysis takes 2 days. 22
References Eastwood, B. (2013). Big Data Analytics Use Cases for Healthcare IT. CIO, http://www.cio.com/slideshow/detail/126493?goback=%252egde _2712281_member_5801855037156114432#slide1 Eastwood, B. (2013). Can Healthcare Big Data Reality Live Up to Its Promise? CIO, http://www.cio.com/article/738121/can_healthcare_big_data_re ality_live_up_to_its_promise_?page=3&taxonomyid=3147 Hey, T. (n.d.). Big Data is Transforming Science, NewScientist, http://www.newscientist.com/cloudup/article/in426 IBM (n.d.) IBM Big Data Platformhttp://www 01.ibm.com/software/data/bigdata Miemis, V. (2010). Essential Skills for 21st Century Survival: Part I: Pattern Recognition, emergent by design. http://emergentbydesign.com/2010/04/05/essential skills for 21st century survival part i pattern recognition/ 45 References Pi, Y., Liao, W., Liu, M., & Lu, J. (2008). Theory of Cognitive Pattern Recognition, Pattern Recognition Techniques, Technology and Applications, Peng Yeng Yin (Ed.), ISBN: 978 953 7619 24 4, InTech, http://www.intechopen.com/books/pattern_recognition_techniqu es_technology_and_applications/theory_of_cognitive_pattern_rec ognition President s Council of Advisors on Science and Technology (2007). Leadership Under Challenge: Information Technology R&D in a Competitive World An Assessment of the Federal Networking and Information Technology R&D Program 35 Sinha, P. (2002). Recognizing complex patterns. Nature Neuroscience Supplement. Vol. 5, pp 1093 1097 Doi: 10.1038/nn949 46 23
Contact Donald A. Donahue, DHEd, MBA, FACHE University of Maryland University College donald.donahue@faculty.umuc.edu Diogenec Group donald.donahue@diogenec.com 202 701 6234 47 24