Applying Text Analytics to the Patent Literature to Gain Competitive Insight

Similar documents
TITLE OF PRESENTATION. Elsevier s Challenge. Dynamic Knowledge Stores and Machine Translation. Presented By Marius Doornenbal,, Anna Tordai

COMPREHENSIVE COMPETITIVE INTELLIGENCE MONITORING IN REAL TIME

Patent portfolio audits. Cost-effective IP management. Vashe Kanesarajah Manager, Europe & Asia Clarivate Analytics

Latest trends in sentiment analysis - A survey

Data Mining Misconceptions #1: The 50/50 Problem

Clinical Natural Language Processing: Unlocking Patient Records for Research

Exploring the New Trends of Chinese Tourists in Switzerland

An Introduction to SIMDAT a Proposal for an Integrated Project on EU FP6 Topic. Grids for Integrated Problem Solving Environments

Semantic networks for improved access to biomedical databases

WHITE PAPER. NLP TOOL (Natural Language Processing) User Case: isocialcube (Social Networks Campaign Management)

Ken Buetow, Ph.D. Director, Computation Science and Informatics, Complex Adaptive ASU Professor, School of Life Science

An Intellectual Property Whitepaper by Katy Wood of Minesoft in association with Kogan Page

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

Use of Patent Landscape Reports for Commercial Activities

THE USE OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN SPEECH RECOGNITION. A CS Approach By Uniphore Software Systems

A Knowledge Discovery Framework for XML-Literature-Data

RDA 9.2: Addition of elements for Given name and Surname

EXTENDED TABLE OF CONTENTS

Discovering Undiscovered Public Knowledge with Influence Search

Research Challenges in Forecasting Technical Emergence. Dewey Murdick, IARPA 25 September 2013

Discovering Undiscovered Public Knowledge with Influence Search

TIPSTER Phase III Accomplishments

RIS3-MCAT Platform: Monitoring smart specialization through open data

Enabling daily R&D work with digital tools

InSciTe Adaptive: Intelligent Technology Analysis Service Considering User Intention

Are You Making These Common American Interview Mistakes?

Disney acquires 21st Century Fox veed snapshot February 2018

Analogy Engine. November Jay Ulfelder. Mark Pipes. Quantitative Geo-Analyst

Quality by Design. Innovate Design Development Create value. Correct definition of QbD and its relation to product and process development

Confidently Assess Risk Using Public Records Data with Scalable Automated Linking Technology (SALT)

2. Amy raises $58.75 to participate in a walk-athon. Jeremy raises $23.25 more than Amy. Oscar raises 3 times as much as Jeremy. How much money does

Matheo Patent - Automatic Patent Analysis Technology mapping Technological choices

FOREST PRODUCTS: THE SHIFT TO DIGITAL ACCELERATES

Mining Technical Topic Networks from Chinese Patents

COM C. Rozwell

Harnessing the Power of Salt for Renewable Energy. Jen Sexton CAS Government Sales Specialist ACS on Campus

Disney acquires 21st Century Fox. veed snapshot February 2018

Building a Business Knowledge Base by a Supervised Learning and Rule-Based Method

Developing an Innovation Process that Works

Information Infrastructure II (Data Mining) I211

IBM SPSS Neural Networks

General Briefing v.1.1 February 2016 GLOBAL INTERNET POLICY OBSERVATORY

Evolution and scientific visualization of Machine learning field

BASF innovation approach digitalization in R&D

World Oil Refining Logistics and Demand model (WORLD)

8) NOR AZLINAYATI ABDUL MANAF

U-Multirank 2017 bibliometrics: information sources, computations and performance indicators

Evaluation of Scientific and Technological Innovation using Statistical Analysis of Patents

Improve access to EU content through thesaurus matching

COMPUTER SCIENCE AND ENGINEERING

Indiana State University Job Growth Report

New frontiers in the strategic use of patent information Dr. Victor Zhitomirsky PatAnalyse Ltd

Technology forecasting used in European Commission's policy designs is enhanced with Scopus and LexisNexis datasets

Missed Opportunity? 1

Automated Generation of Timestamped Patent Abstracts at Scale to Outsmart Patent-Trolls

Job Title: DATA SCIENTIST. Location: Champaign, Illinois. Monsanto Innovation Center - Let s Reimagine Together

Meta Scientific Discovery Beyond Search CHAN ZUCKERBERG INITIATIVE

Exploring the Political Agenda of the Greek Parliament Plenary Sessions

Solutions. Trusted Content to Innovative. From

Objective: Use the addition of adjacent angle measures to solve problems using a symbol for the unknown angle measure.

Below are four problems which are comparable in organization, complexity and length to the four problems on the upcoming Ling 100 final.

Digital Medical Device Innovation: A Prescription for Business and IT Success

Space Biology RESEARCH FOR HUMAN EXPLORATION

Transcription of Scene 3: Allyship at the Sentence Level

Manufacturing Technology

THE DEEP WATERS OF DEEP LEARNING

Global Public Health Intelligence Network (GPHIN)

New Concepts and Trends in International R&D Organisation

Implementing Model Semantics and a (MB)SE Ontology in the Civil Engineering & Construction Sector

An Open Innovation Machine Through Rapid Technology Intelligence Processes

LANGUAGE MATHEMATICS READING SCIENCE

Applied Safety Science and Engineering Techniques (ASSET TM )

FLORIDA LANGUAGE MATHEMATICS READING SCIENCE

ELL CENTER SCIENCE A

Practical Aspects of Logic in AI

PYBOSSA Technology. What is PYBOSSA?

Extracting Social Networks from Literary Fiction

LANGUAGE MATHEMATICS READING SCIENCE

Data, technology and the future of health

Funding & Patents. Enterprise & Project Management

The real impact of using artificial intelligence in legal research. A study conducted by the attorneys of the National Legal Research Group, Inc.

Session 3: Position Papers (14:30 16:00)

The 2018 Publishing Landscape: Technological Horizons. Lyndsey Dixon Editorial Director, APAC Journals Taylor & Francis Group

Disclosure: Within the past 12 months, I have had no financial relationships with proprietary entities that produce health care goods and services.

Implementing Model Semantics and a (MB)SE Ontology in Civil Engineering & Construction Sector

Measuring patent similarity by comparing inventions functional trees

Indian Pharmaceutical Alliance. Responses to the issues raised in the Discussion Paper on the Utility Model

FORESIGHT AND UNDERSTANDING FROM SCIENTIFIC EXPOSITION (FUSE) Incisive Analysis Office. Dewey Murdick Program Manager

Meningitis Symptoms Extraction from Published Conference Research Projects and Journals

Know your skills and know what you love, I am going to talk about that and it will make more sense later. And, a very cheesy, believe in yourself.

Capturing and Classifying Ontology Evolution in News Media Archives

PURELY NEURAL MACHINE TRANSLATION

Institute of Theoretical and Applied Mechanics AS CR, v.v.i, Prosecka 809/76, , Praha 9

Our Quality Promise WHITE PAPER

RECENT EMERGENT TRENDS IN SENTIMENT ANALYSIS ON BIG DATA

Design and Technology Subject Outline Stage 1 and Stage 2

4. Which word below is both a noun

HELPING BIOECONOMY RESEARCH PROJECTS RAISE THEIR GAME

2018 NISO Calendar of Educational Events

Graduate Studies in Computational Science at U-M. Graduate Certificate in Computational Discovery and Engineering. and

Transcription:

Applying Text Analytics to the Patent Literature to Gain Competitive Insight Gilles Montier, Strategic Account Manager, Life Sciences TEMIS, Paris www.temis.com

Lessons Learnt TEMIS has been working with Life Science and other industry clients for many years Naturally, requests, comments and suggestions which were made in these projects led us to gradually refine and extend our approaches The following slides attempt to consolidate some of these aspects Copyright 2007 TEMIS - All Rights Reserved Slide 2

Text Mining for Life Sciences Organizations Improve information discovery through the chain Drug discovery & lead identification Patent analysis Safety & adverse event detection Competitive Intelligence Sentiment Analysis Discovery and Research Preclinical Bu Clinical Manufacturing Sales, Mtkg and Service Business Process Copyright 2007 TEMIS - All Rights Reserved Slide 3

Patent Analysis Questions TEMIS solutions are asked to solve: Bibliometric questions Who is active on a topic? Who is rather product-oriented or process-oriented? In which country is this company active? Specific and technical questions Is this device effective againts this disease? Which metals from the family of rare earths are used? What is the unique aspect of a given patent? Patent literature deliberately uses knew terms which are hard to find by classical means How Text Analytics can help? Copyright 2007 TEMIS - All Rights Reserved Slide 4

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type TNF is a protein, Diabetes Mellitus type2 is a disease Aspirin is a chemical substance Copyright 2007 TEMIS - All Rights Reserved Slide 5

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type Recognition of variants Tumor necrosis factor is the same thing as TNF NIDDM is the same thing as Diabetes Mellitus type 2 Acecylsalicylic acid is the same thing as aspirin Copyright 2007 TEMIS - All Rights Reserved Slide 6

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type Recognition of variants Linking (canonical information) Proteins database identifiers Chemical substances structures Disease terms thesaurus identifiers Copyright 2007 TEMIS - All Rights Reserved Slide 7

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type Recognition of variants Linking (canonical information) Cross-linking between entities Proteins Chemical substances Disease terms based on detailed syntactic analysis or just proximity Copyright 2007 TEMIS - All Rights Reserved Slide 8

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type Recognition of variants Linking (canonical information) Cross-linking between entities Relevance So many hits! Which ones are interesting? Which ones are new? Copyright 2007 TEMIS - All Rights Reserved Slide 9

Text Analytics Requirements 1. Language analysis Domain & scenario specific Recognition of the semantic type Recognition of variants Linking (canonical information) Cross-linking between entities Relevance Openness: Black boxes won t do the job No thesaurus/entity recognizer is complete: guessing semantic types Foresee user-defined extensions Recognition of new terms Copyright 2007 TEMIS - All Rights Reserved Slide 10

Text Analytics Levels Entity relations Entity extraction Morpho-syntactic analysis Roles and Relationships a company in an acquisition event a compound in a chemical equations The recognition of distinct entities Examples: proteins, chemical compounds, diseases, companies, person names General linguistic preprocessing Results: nouns, verbs, adjectives, noun phrases, etc. Copyright 2007 TEMIS - All Rights Reserved Slide 11

Semantic Knowledge Modeling Building Skill Cartridges Each of one define a specific domain of interest Syntactic and semantic rules Competitive Intelligence Biological Entities Relationships Plug & Play Skill Cartridges Concept & Meaning Extraction Words (any concept) Meaning = Acquisition Target & buyer Amount & date... Meaning =Interactions Genes & proteins Inhibition Deceases Localization... Generic Word Extraction Text (any kind, any format, 16 languages) Copyright 2007 TEMIS - All Rights Reserved Slide 12

Life Sciences Skill Cartridges Relationships Copyright 2007 TEMIS - All Rights Reserved Slide 13

Current situation The Skill Cartridge concept is a powerful and successful model, allowing TEMIS to communicate and apply a known flexible approach to a wide range of scenarios However: Skill Cartridge of the BER- and CER-type is costly to build and very specific What about if you want to allow the users to analyze and explore content not only with predefined known terms but with open terms in a cross domain Potentially any term can be of interest but the most frequent item is not always the most interesting Copyright 2007 TEMIS - All Rights Reserved Slide 14

Solution n 1: Open Terms Need to allow the user to analyze and explore content not only with predefined known terms but with open terms. Important information can be discovered using the simple extraction of domain independent term candidates Open terms doesn t mean simple terms Doing a little math we can restrict the analysis to only the (presumably) relevant information Make a statistically guided guess about which terms are relevant (keep only a small number) Assign confidence score to each term Set of parameters to customize the results Copyright 2007 TEMIS - All Rights Reserved Slide 15

Solution n 1: Open Terms with RTF RelevantTermFinder (RTF) Allows to work cross-domain Without the need to manually adapt to new domains Separates important from unimportant information Advantages: Approach is very replicable Can be applied in many different contexts with minor or no variation Approach technically very simple RTF is fast, multilingual Approach allows to address exploratory scenarios Finding new information on issues that were not modeled before Copyright 2007 TEMIS - All Rights Reserved Slide 16

Solution n 1: Example with RTF One real-world example on patent data (there are many others ) Scenario Corpus of ~2000 Patents on «Stents» (bioresorbing stents) Answer one specific question: why in a set of patent documents people used «Yttrium» in their products? There is no specific Cartridge for this question and there will never be one, because the question came up ad hoc Question: Can Luxid guide me towards the really interesting issues? Copyright 2007 TEMIS - All Rights Reserved Slide 17

Search Search for documents concerning yttrium (a rare earth metal) Copyright 2007 TEMIS - All Rights Reserved Slide 18

Analysis Returns 25 documents let s now analyse them Copyright 2007 TEMIS - All Rights Reserved Slide 19

From Frequency to Relevance Analysis returns frequency sorted list. Observation: many terms are not informative Can we do better? Yes, let s sort the terms according to the strength of their association with the term yttrium Copyright 2007 TEMIS - All Rights Reserved Slide 20

Discovering relevant topics Relevance-Sorted list suggests that yttrium may have something to do with very specific properties of the device, namely surface hardness, corrosion resistance and fatigue strength Copyright 2007 TEMIS - All Rights Reserved Slide 21

RTF applications: Similar documents: Show standards similar documents to one document Deduplication Remove real duplicates and near duplicates Categorization Classify documents automatically according to ontologies Clustering: Classify documents automatically Copyright 2007 TEMIS - All Rights Reserved Slide 22

Conclusion: RTF is a Skill Cartridge Use and deploy like any other Skill Cartridge Self-contained, internal DB, no external dependency Working with open terms is a very useful complement to our existing Skill Cartridges Combined with appropriate sorting it allows to reach high relevant results It improves replicability and broadens the range of context in which Luxid can be used RTF is available Luxid Copyright 2007 TEMIS - All Rights Reserved Slide 23

Solution n 2: Easing the Skill Cartridge Model Goals Better support Patent Analysts in setting vocabularies Ease the customization How? Improve our Skill Cartridge Development Studio Develop new customization tools/products Social Tagging Make Knowledge Workers contribute to Skill Cartridge development Keep centralized control & monitoring Copyright 2007 TEMIS - All Rights Reserved Slide 24

Solution n 2: Easing the Skill Cartridge Model Skill Cartridge Builders Luxid Administrators 3 profiles Skill Cartridge Builders Solution administrators & customizers Business Users 3 environments Luxid Users Development Customization & test Production 3 product stacks Development Studio Lexicon Manager Dynamic Mapping Editor Copyright 2007 TEMIS - All Rights Reserved Slide 25

Skill Cartridge Builders Skill Cartridge Builders Luxid Administrators Build Skill Cartridges Mix of rules & pattern & lexicon entries Based on low level components (terms, entities, verbs, relations, ) Define normalization & display rules How? Development Studio Luxid Users Development Environment (Edit/Debug/ ) Import/Export taxonomies & lexicons Define, edit rules and check consistency Optimize & compile source code Who? Information Professional TEMIS Product team TEMIS Professional Services team Certified partners Information Specialist at customer site Copyright 2007 TEMIS - All Rights Reserved Slide 26

Luxid Administrators & Customizers Skill Cartridge Builders Luxid Administrators Enrich Skill Cartridge Tailor with project/customer taxonomy How? Lexicon Manager Import taxonomy Adjust lexicons (add/remove/edit entries) Check consistency Re-compile Who? Luxid Users TEMIS Professional Services team Certified partners IT & Information Specialist Copyright 2007 TEMIS - All Rights Reserved Slide 27

Business Users Skill Cartridge Builders Luxid Administrators Fine tune Luxid extractions Adjust extraction results Propose improvements to Skill Cartridge How? Dynamic Mapping Editor Merge 2 entities (immediate) Ex: BASF = BASF Plant Sciences Change entity description (immediate) Ex: Carl Zeiss = Company (instead of person) Luxid Users Remove entity (immediate) Ex: BUT is not a company (although a French one) Add an entity (subject to reprocessing) Profile Ex: XyyyZ is a protein Business users at customer site Copyright 2007 TEMIS - All Rights Reserved Slide 28

Conclusion Patent Literature uses terms which are hard to find by classical means The powerful approach of building Skill Cartridges needs to completed by new approaches and tools: 1. Using Open Terms like RTF allows to discover specific information and answer open questions 2. Allowing Knowledge workers and Patent Analysts to easily set up new vocabularies increases productivity and serendipity Thank You Copyright 2007 TEMIS - All Rights Reserved Slide 29

Beyond Search >> Luxid for Life Sciences Gracias! WWW.TEMIS.COM