|Brain-inspired hardware and algorithm co-design for low power online training on the edge
|Our digital society is shifting to an era of pervasive and specialized edge-computing systems. Deep learning (DL) is supporting this revolution by enabling unprecedented performances for a wide range of pattern classification and regression applications. However, the conventional Von Neumann hardware architecture and training algorithms are not the most optimally suited for the low-power and real-time requirements of edge-computing devices. Event-based neuromorphic technologies aim to overcome this problem by removing the computational abstractions and better exploiting the physics of the substrate while running online algorithms that require spatially and temporally local weight updates. In this talk, I will introduce a co-design approach between recent event-based algorithms, scalable emerging memory devices, and circuits that together holistically fulfill these requirements. Specifically, I will share our recent works that exploit memristive devices intrinsic properties to implement event-driven gradient-based quantization-aware local learning on-chip. We will cover methods to increase the bit-resolution of memristive devices, implement scalable eligibility traces on hardware for solving temporal credit assignment, derive local error-based learning rules, and more. We will discuss some applications of these technologies for real-time sensory processing, such as gesture recognition, pattern generation, and biomedical signal processing.
|University of Zurich and ETH Zurich
|Melika Payvand is a senior research scientist at the Institute of Neuroinformatics, University of Zurich and ETH Zurich. She received her M.S. and Ph.D. degrees in electrical and computer engineering from the University of California Santa Barbara in 2012 and 2016 respectively. Her research interest is in understanding the organizing principles of biological nervous systems and employing them in building more efficient and intelligent artificial systems, following a co-design approach across devices, circuits and algorithms. She is an active member of the neuromorphic community, co coordinating the European NEUROTECH project, chairing the neuromorphic engineering track at the IEEE ISCAS conference, co-chairing the International Conference on Neuromorphic Systems (ICONS), and serving as a scientific committee member of the Capocaccia neuromorphic intelligence workshop. She has also co-organized the Women in Circuits and Systems (WiCAS) event at the IEEE ICECS conference, and has served as the guest editor of Frontiers in Neuroscience and IOP journal of neuromorphic computing and engineering. She is the winner of the best neuromorph award of the 2019 Telluride neuromorphic workshop.
|Memory-Centric Machine Learning
|When performing machine learning tasks, central and graphics processing units consume considerably more energy for moving data between logic and memory units than for doing actual arithmetic. Brains, by contrast, achieve superior energy efficiency by fusing logic and memory entirely, performing a form of “in-memory” computing. Until now such an integration between logic and memory was impossible at a large scale using CMOS technology. However, companies such as Intel, Samsung, or TSMC, have recently reached production status on new memory devices such as (mem)resistive, phase change, and magnetic memories, which give us an opportunity to achieve an extremely tight integration between logic and memory. Unfortunately, these new devices also come with important challenges due to their unreliable nature. In this talk, we will look at neuroscience inspiration to extract lessons on the design of in-memory computing systems with unreliable devices. We will first study the reliance of brains on approximate memory strategies, which can be reproduced for machine learning. We will give the example of a hardware binarized neural network relying on resistive memory. Based on measurements on a hybrid CMOS and resistive hafnium oxide memory chip exploiting a differential approach, we will see that such systems can exploit the properties of emerging memories without the need for error-correcting codes, and achieve extremely high energy efficiency. Then, we will present a second approach where the probabilistic nature of emerging memories, instead of being mitigated, can be fully exploited to implement a type of probabilistic learning. We show that the inherent variability in hafnium oxide memristors can naturally implement the sampling step in the Metropolis-Hastings Markov Chain Monte Carlo algorithm, and train experimentally an array of 16,384 memristors to recognize images of cancerous tissues using this technique. Finally, we will present prospects concerning the implementation of different learning algorithms with emerging memories.
|Damien Querlioz is a CNRS Researcher at the Centre de Nano-sciences et de Nanotechnologies of Université Paris-Saclay. His research focuses on novel usages of emerging non-volatile memory and other nanodevices, in particular relying on inspirations from biology and machine learning. He received his predoctoral education at Ecole Normale Supérieure, Paris and his PhD from Université Paris-Sud in 2009. Before his appointment at CNRS, he was a Postdoctoral Scholar at Stanford University and at the Commissariat a l’Energie Atomique. Damien Querlioz is the coordinator of the interdisciplinary INTEGNANO research group, with colleagues working on all aspects of nanodevice physics and technology, from materials to systems. He is a member of the bureau of the French Biocomp research network. He has co-authored one book, nine book chapters, more than 100 journal articles, and conference proceedings, and given more than 50 invited talks at national and international workshops and conferences. In 2016, he was the recipient of an ERC Starting Grant to develop the concept of natively intelligent memory. In 2017, he received the CNRS Bronze medal. He has also been a co-recipient of the 2017 IEEE Guillemin-Cauer Best Paper Award and of the 2018 IEEE Biomedical Circuits and Systems Best Paper Award.
|How Computer Graphics advances Hardware Aware Efficient Training
|Both reinforcement learning and light transport simulation can be modeled by the same integral equation, which in computer graphics is solved by path tracing using Monte Carlo and quasi-Monte Carlo methods. The similarity of continuous representations of layers in neural networks and the integrals of graphics inspires sampling paths in neural networks. This results in efficient deterministic initialization, instant compression and quantization, and the generation of neural networks that are sparse from scratch and hence can be trained much more efficiently than their fully connected counterparts. Generating the paths and neural networks by low discrepancy sequences leads to efficient algorithms and hardware. Circling back to graphics, we introduce a spatial multiresolution hash encoding and explain its superior inference and training performance in real-time applications in combination with tiny neural networks when executed on massively parallel graphics processing units (GPUs).
|Alexander Keller is a Senior Director of Research at NVIDIA. Before, he had been the Chief Scientist of mental images, where he had been responsible for research and the conception of future products and strategies including the design of the NVIDIA Iray light transport simulation and rendering system. Prior to industry, he worked as a full professor for computer graphics and scientific computing at Ulm University, where he co-founded the UZWR (Ulmer Zentrum für wissenschaftliches Rechnen) and received an award for excellence in teaching. Alexander Keller has more than 3 decades of experience in ray tracing, pioneered quasi-Monte Carlo methods for light transport simulation, and connected the domains of machine learning and rendering. His research interests are at the intersection of graphics, communications, and machine learning.
|Neural Network Design and Training for Efficient On-Device Learning
|Edge devices generate a large amount of data that can be used to improve the performance of neural networks. However, the private nature of such data prevents them from being uploaded to servers and requires on-device learning. This presents a challenge pertaining to the tight resource budget associated with edge devices. To address this challenge, we will present approaches to neural network design and training in this talk. To design efficient network architectures that require low resources to train, we propose the hardware-aware neural architecture search algorithm, NetAdaptV2. NetAdaptV2 automatically and rapidly discovers an efficient network architecture in terms of the given metrics on the target hardware. It uses empirical measurements to guide the search so that no hardware knowledge is required. According to our experiments, NetAdaptV2 discovers network architectures with better accuracy-latency/accuracy-MAC trade-offs than related works and reduces the total search time by up to 5.8x on ImageNet. To improve the efficiency of training under the setting of federated learning, we propose the efficient training algorithm, Partial Variable Training (PVT). PVT reduces memory usage and communication cost by training only a small subset of variables on edge devices. With PVT, we show that network accuracy can be maintained by utilizing more local training steps and devices, which is favorable for federated learning involving a large population of devices. According to our experiments on state-of-the-art neural networks for speech recognition and two different datasets, PVT can reduce memory usage by up to 1.9x and communication cost by up to 593x while attaining comparable accuracy when compared with full network training.
|Tien-Ju Yang is currently a Research Scientist at Google focusing on efficient large-scale federated learning and on-device learning. He received his M.S. and Ph.D. degrees in Electrical Engineering and Computer Science from Massachusetts Institute of Technology (MIT), Cambridge, MA, in 2018 and 2020, respectively. His research interest spans the area of deep learning, computer vision, image/video processing, speech recognition, machine learning, and VLSI system design. He co-authored the book “Efficient Processing of Deep Neural Networks“ and co-taught the tutorial on “Efficient Image Processing with Deep Neural Networks“ at IEEE International Conference on Image Processing 2019. He also won ﬁrst place in the 2011 NTU Innovation Contest.
|DNN Quantization with Mixed Precision and Trained Lookup Tables.
|Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In this talk we will see how we can parametrize the quantizers efficiently to learn the optimal bitwidth for each layer. We show that a parametrization of the quantizer with step size and dynamic range is the key to achieve a stable training and a good final performance. The bitwidth can then be inferred from them. In a second part we will introduce look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary’s values. We show that this method is very flexible and that many other techniques can be seen as special cases of LUT-Q. For example, we can constrain the dictionary trained with LUT-Q to generate networks with pruned weight matrices or restrict the dictionary to powers-of-two to avoid the need for multiplications.
|Fabien Cardinaux is leading a R&D team at the Sony R&D Center Europe in Stuttgart (Germany). Prior to joining Sony in 2011, he has worked as a Postdoc at the University of the Sheffield (UK). In 2005, he obtained a PhD from EPFL (Switzerland) for his work on machine learning methods applied to face authentication. His current research interests lie in deep neural network footprint reduction, neural architecture search and audio content representation.
|Neural Bellman-Ford Networks: An Efficient and General Path-based Method for Link Prediction based on GNNs
|Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.
|Mila and HEC Montreal
|Jian Tang is currently an assistant professor at Mila-Quebec AI Institute and also at the Computer Science Department and Business School of University of Montreal. He is a Canada CIFAR AI Research Chair. His main research interests are graph representation learning, graph neural networks, geometric deep learning, deep generative models, knowledge graphs and drug discovery. During his PhD, he was awarded with the best paper in ICML2014; in 2016, he was nominated for the best paper award in the top data mining conference World Wide Web (WWW); in 2020, he is awarded with Amazon and Tencent Faculty Research Award. He is one of the most representative researchers in the growing field of graph representation learning and has published a set of representative works in this field such as LINE and RotatE. His work LINE on node representation learning has been widely recognized and is the most cited paper at the WWW conference between 2015 and 2019. Recently, his group just released an open-source machine learning package, called TorchDrug, aiming at making AI drug discovery software and libraries freely available to the research community. He is an area chair of ICML and NeurIPS.