How Preferred Networks has Defined Their Values: The Promise and Challenge of Deep Learning in Domains of Physical Control

How Preferred Networks has Defined Their Values: The Promise and Challenge of Deep Learning in Domains of Physical Control Hiroshi Maruyama PFN Fellow

About Myself 1983-2009: IBM Research, Tokyo Research Laboratory Worked on Natural Language Processing, Hand-Writing Recognition, XML, Distributed Middleware, Security 2003-2004 IBM Consulting Services 2006-2009 Director of the lab 2009-2010: Canon, Inc. Deputy Group Executive of Digital Platform Dev. HQ 2011-2016 Institute of Statistical Mathematics (a government research institute) 2016 - Preferred Networks, Inc. 2

Preferred Networks, Inc. (PFN) Founded: March 2014 (Spin-off from Preferred Infrastructure) Located: Tokyo, Japan Berkeley, CA. US (PFN America) Number of Employees: >180, mostly Engineers & Researchers Investors: NTT (2014), Fanuc (2015), Toyota (2015, 2017), Hakohodo, Hitachi, Mizuho Bank, Mitsui&Co., Chugai Phermaceutical, Tokyo Electron Our focus: Distributed Deep Learning applied to transportation, manufacturing, and healthcare industries 3

Agenda 1. What is AI? 2. What is Deep Learning? 3. Our Values 4

Agenda 1. What is AI? 2. What is Deep Learning? 3. Our Values 5

Approaches to Intelligence Neuroscience Brain Science Psychology Observe brain activities Computer Science (a.k.a. AI) Mimic human intelligence Observe human behavior Economics Mathematical model - Game theory - Optimization Mechanical Engineering Differential equation Control theory 6

Intelligence = Do Complex Math? Computer Algebra System MACSYMA, 1968-1982 Richard Pavelle and Paul S. Wang. 1985. MACSYMA from F to G. J. Symb. Comput. 1, 1 (March 1985), 69-100.

Intelligence = Understanding Language? Natural Language Understanding System SHRDLU, 1971 Pick up a big red block http://hci.stanford.edu/winograd/shrdlu/aitr-235.pdf

Intelligence = Play Games? IBM Deep Blue, 1997 Google AlphaGo, 2016 9

Hiroshi Maruyama Technology Focus of AI Research has Changed Over Time 1 st Wave of A. I. (1956-1974) Symbol Processing (LISP) Means-End Analysis Language Parsing 2 nd Wave of A. I. (1980-1987) Knowledge Representation Expert System Ontology 3 rd Wave of A. I. (2008- ) Statistical Machine Learning Deep Learning - Garbage Collection - Search Algorithms - Formal Language Theory - : - Object-Oriented Language - Modeling - Semantic Web - : Inductive Programming 10

Artificial Intelligence is an Overloaded Term 1. For researchers, AI is a research activity (or field) to study intelligence by simulating it by machine Search, Inference, Optimization, Recognition, NLP, 2. For AI vendors, AI is ANY information system that utilizes ANY of above research results 3. For general public, AI is a human-like machine intelligence HAL 9000, Terminator, Cylon, Astroboy, 11

Hiroshi Maruyama Please make clear distinction between generalized AI and specialized AI There s a distinction, which is probably familiar to a lot of your readers, between generalized AI and specialized AI. GAI is Sci-Fi https://www.wired.com/2016/10/president-obama-mit-joi-ito-interview/ 12

Agenda 1. What is AI? 2. What is Deep Learning? 3. Our Values 13

What is Deep Learning? A (Stateless) Function Low-dimensional for classification, very high-dimensional for generation X Very highdimensional, any combination of continuous and categorial variables Y = f(x) Y 14

Hiroshi Maruyama Example: Converting Celsius to Fahrenheit Requirements Input: C Output: F Where F is Fahrenheit equivalent of C in Celsius A Priori Knowledge Model F = 1.8 * C + 32 Algorithm double c2f(double c) { return 1.8*c + 32.0; } Model must be known in advance, and Algorithm must be constructible 15

Hiroshi Maruyama Alternative Approach Data-Driven, Inductive Programming Find a model that represents this data set 16

Hiroshi Maruyama Machine Learning (aka Statistical Modeling) does this! Estimated Model F = 1.8 * C + 32 + e e ~ N(0,10) 17

Hiroshi Maruyama In traditional Statistical Modeling, the Model Family must be fixed in advance What is the function that represents this data? Too many parameters result in overfitting! Choosing the right model family is difficult 18

Deep Learning can approximate a function without too much overfitting (in many cases) Mean-square error is backpropagated Ramp function as activation function 2-layers of 10- node hidden layers Output Total 141 parameters Input Good approximation without too much overfitting 19

Deep Neural Net as a Universal Computing Mechanism Output Very large number of parameters Can approximate ANY highdimensional function* Pseudo Turing Complete! Input * G. Cybenko. Approximations by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2(4):303 314, 1989. 20

How Deep Learning Works by Yann LeCun https://code.facebook.com/pages/1902086376686983 21

Model is Unkown (1): Image Segmentation for autonomous driving https://www.youtube.com/watch?v=lgojchgdvqs 22

Model is unknown (2): Image Segmentation for Picking Robot 2 nd Place in the Picking Task in Amazon Picking Challenge 23

Model is unknown (3): Flexible Voice Control of Robot 24

Algorithm is unknown (1): Auto Coloring Line Drawings

Algorithm is unknown (2): Reinforcement Learning for Autonomous Driving Consumer Electronics Show (CES) 2016

https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/ 27

Deep Learning Requires lots of Computation Image Video Rec. Life Science P:Peta E:Exa F:Flops 10P(Image) 10E(Video) Flops 100P 1E Flops 100million images, 10M SNPs per person. 100PF for 1million, 1EF for100 million. Speech Rec. Autonomous Driving Robotics/Drone 10P Flops 5K hours of 10K people of audio data 100K hours of synthetic audio data for training [Baidu 2015] 1E 100E Flops 1TB/day/autonomous cars 10~1000 cars, 100 days of data 1E 100E Flops 1TB/car/year Data from 1~100M cars Machine generated data is much bigger than human generated data These estimation is based on; To finish training using 1GB within 1day require 1Tflops 10PF 100PF 1EF 10EF 100EF 29

Chainer: Scalable Deep Learning Framework Open source Flexible Dynamic NN by Define-by-run Supported by major platforms Multi-node scalability http://chainer.org/general/2017/02/08/performance-of-distributed-deep-learning-using-chainermn.html 30

31 We Demonstrated World Fastest ImageNet Training Chainer MN broke the world record of DL training speed in Dec. 2017 PFN s in-house supercomputer MN-1 is ranked at 91 st in WW Supercomputer Top-500, Nov. 2017

2 nd Place at Google AI Open Image Challenge (Aug. 2018) Difference with the leader was 0.023% 2 nd Place 32

with MN1b Cluster MN-1b GPU Cluster SW: ChainerMN V100 (32GB) x512 Infiniband Scalability Results 16-epoch training in 33 hours Scalability: 83% (compare to 8-GPU performance)

Fundamental Limitation of ML (1) Data is sampled at some point in the past Training data set Training Model Inference (i.e., prediction) based on the trained model Timeline Statistical Machine Learning works only if the future is similar to the past

Fundamental Limitation of ML (2) Powerless on data in unseen regions?? Extrapolation Interpolation Training Data Set

Fundamental Limitation of ML (3) Always works statistically Random Sampling!! Original Distribution i. i. d. Training Data Set Trained Model No guarantee of 100% correctness 36

What is Deep Learning Recap A new way of programming No prior knowledge on model or algorithm Preparing training dataset is the key Creative teacher signal allows innovative applications Requires a lot of computation New workload opens opportunities for new architectures It s statistical modeling Assume i. i. d. (independent and identically-distributed) Approximation only (no exact answers) 37

Agenda 1. What is AI? 2. What is Deep Learning? 3. Our Values 38

PFN s Business Domains (as of 2018) Cyber Speech Text Game Photo Consumer Healthcare B2B 39 Humanoid Robot Automotive Infrastructure Physical 39 Factory Robot Industrial IoT

Business Challenges Most deep learning technologies (incl. Chainer) are open-sourced Hard to monetize Consulting / system integration services are easy money but do not scale Grows only proportional to # of deep learning experts Many engagements require common technologies only Difficult to differentiate PFN from other AI vendors Time spent on engagements prevents our researchers / engineers to learn new technologies 40

Our Strategy Technologies speak themselves Demonstrating cutting-edge technologies is the best marketing tool Also best way to attract the best talent WW ~50% of new applicants to PFN are from outside of Japan Heavy investment on human and computational resources Keep today s healthy business to maintain freedom of operation Joint Research and Development with strategic partners New joint businesses that are only possible by cutting-edge technologies Capture the best business opportunity whenever it arises How to make us ready for this? 41

We organize ourselves as Motivation-Driven PFN s organizational culture is motivation-driven. Being motivated means we are serious about the output of our work. It also means teamwork, because everybody in PFN wants to contribute in one way or another. This culture is the key to enable an extremely flat, flexible, and highperformance organization. 42

No organizational structure, heavy use of IT 43

Individually, we have to Learn or Die Everybody at PFN strives to learn. We are in a very high-velocity industry learning is the only way to continuously adapt and remain cutting edge. We do not stick to one idea, one technology, or one domain. Software engineers at PFN are willing to take challenges on hardware, hardware researchers are happy to switch their fields to HCI design, and so on. This results in a truly learning organization with a diverse background. 44

To others, we are always Proud, but Humble At the core, PFN is a technology company. We keep challenging ourselves, and to do so we attract the best people. At the same time, we understand that we cannot do everything by ourselves. We know there are things that we do not know ( known unknowns ), and respect diverse ideas from diverse people. 45

And this is what we do Boldly Do What No One Has Done Before With our technology, PFN will change the world by providing new software and hardware, by creating new services and transforming businesses, and by making new markets for the better future. Our role in the society here is that PFN does do things that only PFN can do. It is the raison d être of PFN. 46

So, please keep watching us. We will be one-of-a-kind company Thank You 47