Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database

Similar documents
MobilityMapper: Visualizations of Inventor Mobility,

The Latest from the Fung Institute Patent Lab Gabe Fierro, Lee Fleming, Kevin Johnson, Aditya Kaulagi, Guan Cheng Li, Sophia Pham, Bill Yeh

Disambiguation of Inventors, USPTO

Patent Data: New Metrics and New Linkages. How can we be more clever in using our data?

Patenting in Rural America: Inventors, Teams, and Technologies

CleanEnergyPatentMapper: Visualization of the sources of clean tech inventions

Beyond Patents: Recent Work from NSF Science of Science & Innovation Policy (SciSIP)Program

Science of Science & Innovation Policy and Understanding Science. Julia Lane

Global connectivity as the basis for local innovation

MGMT 932, Section 2 (Fall Q2) PhD Seminar in Entrepreneurial Innovation (0.5cu) David Hsu

Organizational Change and the Dynamics of Innovation: Formal R&D Structure and Intrafirm Inventor Networks. Luis A. Rios, Wharton

A Tale of Two Americas: the Evolution of Innovation Networks across US Cities

Overview of Intellectual Property Policy and Law of China in 2017

COMPUTATIONAL SOCIAL SCIENCE AND ADVANCED COMPUTING INFRASTRUCTURE: CHALLENGES AND OPPORTUNITIES

MIS 480: Knowledge Management Dr. Chen May 14, 2009

Mobility of Inventors and Growth of Technology Clusters

Post-Grant Patent Review Conference on Patent Reform Berkeley Center for Law and Technology April 16, 2004

SMALL WORLDS IN NETWORKS OF INVENTORS AND THE ROLE OF SCIENCE: AN ANALYSIS OF FRANCE

Beyond the Disruptive Innovation Trap

Internationalisation of STI

Regional Innovation Ecosystems:

Characteristics of Competitive Places: Changing Models of Economic Dynamism

Size of California s economy US$ trillions, 2009

More of the same or something different? Technological originality and novelty in public procurement-related patents

Supplementary Data for

Chapter 8. Technology and Growth

New forms of scholarly communication Lunch e-research methods and case studies

Issues and Possible Reforms in the U.S. Patent System

Class 5. Competency Exam Round 1. The Process Designer s Process. Process Control Preliminaries. On/Off Control The Simplest Controller

Patenting Strategies. The First Steps. Patenting Strategies / Bernhard Nussbaumer, 12/17/2009 1

Industrial Dynamics. Lecture / Seminar (Master level) Fachbereich Wirtschaftswissenschaften. Economic Policy Research Group

Appendix to Report Patenting Prosperity: Invention and Economic Performance in the United States and its Metropolitan Areas

Navigating the AI Adoption Minefield Pitfalls, best practices, and developing your own AI roadmap April 11

TOKYO, JP. Foundations of Entrepreneurial Ecosystems Working Session: Innovative Capacity

Patent Mining: Use of Data/Text Mining for Supporting Patent Retrieval and Analysis

Text Mining Patent Data

The Globalization of R&D: China, India, and the Rise of International Co-invention

Heterogeneous Innovation over the Business Cycle*

Marketcraft Japanese Style: What Japan Tells Us About the Art of Making Markets Work

The Globalization of R&D: China, India, and the Rise of International Co invention

Science and Innovation Policies at the Digital Age. Dominique Guellec Science and Technology Policy OECD

The Science of Science

Science of Science & Innovation Policy (SciSIP) Julia Lane

Daniel R. Cahoy Smeal College of Business Penn State University VALGEN Workshop January 20-21, 2011

Science, research and innovation performance of the EU 2018

Large Scale Text Analysis

Young Firm Growth Europe s Scaling Up Problem. Erik Stam

Raviv Murciano-Goroff

WIPO Development Agenda

(D) Impact of Artificial Intelligence approaches on patent strategy in the healthcare area

LING FENG (Updated 2015) Curriculum Vitae September 2015

Presentation of Engie Chair

The effect of technology deployment policies on renewable energy R&D

Trade Secrets and Innovation: Evidence from the Inevitable Disclosure Doctrine

Business Method Patents, Innovation, and Policy. Bronwyn H. Hall UC Berkeley and NBER

Predictive Diagnostics for Pump Seals: Field Trial Learnings. Matthew Miller, John Crane

2. What is Text Mining? There is no single definition of text mining. In general, text mining is a subdomain of data mining that primarily deals with

Outline. Patents as indicators. Economic research on patents. What are patent citations? Two types of data. Measuring the returns to innovation (2)

Technology Transfers Opportunities, Process and Risk Mitigation. Radhika Srinivasan, Ph.D. IBM

Innovation and Inclusive Growth in Emerging Economies. Poh Kam Wong Professor, NUS Business School Director, NUS Entrepreneurship Centre

INFS 326: COLLECTION DEVELOPMENT MRS. FLORENCE O. ENTSUA-MENSAH

Mapping the Movement of AI into the Marketplace with Patent Data Research Team:

TRIPS-Plus Provisions and Access to Technologies:

Business Method Patents, Innovation, and Policy

Higher School of Economics, Vienna

MOVING FROM R&D TO WIDESPREAD ADOPTION OF ENVIRONMENTALLY SOUND INNOVATION

Large Scale Text Analysis

Colorful Image Colorizations Supplementary Material

History of the WIPO Development Agenda

Government s Role in Promoting the Use of ICT

Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology

Patents. What is a patent? What is the United States Patent and Trademark Office (USPTO)? What types of patents are available in the United States?

On-site Traffic Accident Detection with Both Social Media and Traffic Data

Big Data Analytics in Science and Research: New Drivers for Growth and Global Challenges

MINISTRY OF EDUCATION, RESEARCH & RELIGIOUS AFFAIRS. Dr. Agnes Spilioti Head of R&DI Policy Planning Directorate

Current Challenges for Measuring Innovation, their Implications for Evidence-based Innovation Policy and the Opportunities of Big Data

Anticipating developments in nanotechnology commercialization

- Examining Opportunities for Georgia

Using Indicators to Assess Evolving Industry-Science Relationships

Changes in library standards Statistics and evaluation as mirror of library innovations

National Intellectual Property Systems, Innovation and Economic Development Framework for Country Analysis. Dominique Guellec

GPU ACCELERATED DEEP LEARNING WITH CUDNN

The Evolution of Regional Knowledge Spaces. Policy Insights for Smart Specialization Strategies

Industry Convergence in the Emerging Mobile Internet*

Internet Appendix of Founder Replacement and Startup Performance

Opening Science & Scholarship

GENEVA COMMITTEE ON DEVELOPMENT AND INTELLECTUAL PROPERTY (CDIP) Fifth Session Geneva, April 26 to 30, 2010

KAUFFMAN DISSERTATION EXECUTIVE SUMMARY

Technological Forecasting & Social Change

Is the Dragon Learning to Fly? China s Patent Explosion At Home and Abroad

WEN YU Ph.D. in Accounting Weatherhead School of Management, Case Western Reserve University, Cleveland, OH

Social Networks and Archival Context R&D to Cooperative

4Q February 2012

Recommendations for Evaluating Large, Interdisciplinary Research Initiatives

Sample Surveys. Chapter 11

Technology Strategy for Managers and Entrepreneurs

Outcomes of the 2018 OECD Ministerial Conference on SMEs & the way forward

Local and Low-Cost White Space Detection

1. Dream Tools for Scholarly Knowledge Management

Using Inventors Patent Data A new approach to the analysis of knowledge spillovers. What spillovers are, and why they matter

Transcription:

Disambiguation and Co-authorship Networks of the U.S. Patent Inventor Database Lee Fleming Many thanks to Julia Lane and SciSIP 199704!

Will the real Matt Marx please stand up? Disambiguation Matt Marx Plainview NY Everett MA Mt View CA Class 704

Many years and a cast of thousands Ron Lai Vetle Torvik Alex D Amour Edward Sun Amy Yu David Doolin Guan-Cheng Li Lee Fleming almost literally, if you count everyone who has helped with data and feedback (Thank you!)

Agenda Overview of disambiguation process flow Peek under the hood Results Implications for science policy Coming attractions

Public Databases Weekly USPTO patent data (1975 2010) Data preparation load and validate clean and format generate datasets Inventor disambiguation algorithm Consolidated inventor dataset Primary Datasets Disambiguated inventor dataset Assignee Inventors Classes Patents Fung Institute Servers and Dataverse Network Platform

Agenda Overview of disambiguation process flow Peek under the hood Results Implications for science policy Coming attractions

In the beginning Compare various fields across patents Weight each field and tune to curated dataset Worked surprisingly well, but Cannot predict insidious model interdependencies e.g., technology field is more important in a large firm Small hand curated datasets are inherently biased So let machine learn a non-parametric model

How does a machine learn? 1) Start with curated data Assumes no bias 2) Sample two sets of variables: name/patent Given one set, learn how well other set predicts a match/non-match Assumes independent influences on match probability Not clear which is better, we use 2) After learning, estimate matches in remaining dataset Learn Patent Name Attributes Learn Name Patent Attributes This is a match Pairs of perfect full name match of rare name Pairs that share > 1 common co-authors and >1 tech classes This is not a match Pairs of different full name non-match of rare names Pairs of inventors from same patent

Disambiguation essentially clustering challenge (10.4M)*(10.4M 1) is a big number Block to reduce pairwise computation Truncate last and first names e.g., all M. Marx s or L. Flem s Lends itself to parallel processing Relax and tighten blocking in series of iterative improvements

Lumping vs. splitting Splitting = #records not in correct cluster / total records Lumping = #records in wrong cluster / total records You may choose one of two poisons Upper bound: more likely to be split Lower bound: more likely to be lumped 2011 Disambiguation results (based on updated Gu 2008 standard): 3.2%, 1.5% for lower bound 3.6%, 1.5% for upper bound Run both if design is sensitive to cut-points Or design a better experiment

How to get the goods Harvard DataVerse Network (DVN) 2011 Disambiguation and network variables 12,000+ downloads https://github.com/funginstitute/downloads Current disambiguations (Sept 4, 2012) Working papers Fung Institute @ Github Code repository

Agenda Overview of disambiguation process flow Peek under the hood Results Implications for science policy Coming attractions

Demographics and ethnicity Kim Jones David Doolin

Regional Disadvantage? Non-competes and inventor mobility Disambiguation enables diffs in diffs model MARA: Michigan s gift to noncompete research Marx, Strumsky, Fleming 2009 Decreased intra-state mobility Marx, Singh, Fleming Brain drain from states that enforce non-competes Of best inventors And ideas The real Matt Marx

Results (also hold with econometric models and CEM matching) pre-mara post-mara relative risk Michigan 0.24% 0.32% 1.353 non-michigan 0.20% 0.13% 0.677 Michigan % increase over non-michigan 99.9% CITATIONS PER PATENT median and below above median pre-mara post-mara relative risk pre-marapost-mara relative risk Michigan 0.20% 0.33% 1.625 Michigan 0.27% 0.31% 1.134 Marx, Strumsky, Fleming 2009 non-michigan 0.13% 0.14% 1.112 non-michigan 0.26% 0.10% 0.395 Michigan % increase over non-michigan 46.1% Michigan % increase over non-michigan 186.8% Decreased intra-state mobility Marx, Singh, Fleming DEGREE Brain median drain and below from states that above median pre-mara post-mara odds ratio pre-marapost-mara odds ratio Michiganenforce non-competes 0.25% 0.22% 0.870 Michigan 0.21% 0.51% 2.388 non-michigan Of best 0.17% inventors 0.11% 0.635 non-michigan 0.29% 0.20% 0.710 Michigan % increase over non-michigan 37.0% Michigan % increase over non-michigan 236.3%

Inventor emigration from MI, pre and post MARA (1985) 1983 1986 1984 1987 Guan-Cheng Li and Laurent El-Ghaoui

Best inventors piling up in states which do not enforce noncompetes M. Marx and L. Fleming, 2012. Noncompetes: Barriers to Exit and Entry? National Bureau of Economic Research Innovation Policy and the Economy, 12: 39-64. University of Chicago Press.

Noncompetes can be bad for firms too Bump in acquisitions, Tobin s q, post MARA NCs trap HK in firms But firms go stale when they can t hire Younge, Tong, Fleming Younge and Marx The real Ken Younge

Agenda Overview of disambiguation process flow Peek under the hood Results Implications for science policy Coming attractions

Implications for Science and Innovation Policy NCs decrease diffusion of people - and ideas within regions Drive best people - and ideas to regions that do not enforce! Managers at incumbent firms like them at first Provides a hiring shield but takes away your sword! Firms fall behind tech frontier because cannot hire fresh blood Active policy controversy: MA considering weakening noncompetes GA just strengthened China just weakened

Agenda Overview of disambiguation process flow Peek under the hood Results Implications for science policy Coming attractions

Coming attractions/discussion Torvik research group Linked PubMed and USPTO disambiguations! API for programmatic access Move beyond citations as measure of value Community built and validated curation standards How can we become a more cohesive and productive community?

http://abel.lis.illinois.edu/resources.html!

News around patent #6,505,559 topics Joint with Laurent El-Ghaoui, UC Berkeley After hits /before hits Before patent filing After Nobel announcement

Verification Standards: Gold: Personal validation Silver: friend-of-a-friend Bronze: scraped resumes Educated guess Synthetic: aka plastic Need public contribution and supported wiki!

Towards a cohesive and productive community Preserve precedence while providing ongoing and intermediate results Build community assets Source code, all of it. Data, all of it. Results, all of them (after publication) High standards in code development Revision control Test coverage to validate implementation And finally collective effort and support!