Scalable and Efficient AI: From Supercomputers to Smartphones
Abstract: Billion-parameter artificial intelligence models have proven to show exceptional performance in a large variety of tasks ranging from natural language processing, computer vision, and image generation to mathematical reasoning and algorithm generation. Those models usually require large parallel computing systems, often called "AI Supercomputers", to be trained initially. We will outline several techniques ranging from data ingestion, parallelization, to accelerator optimization that improve the efficiency of such training systems. Yet, training large models is only a small fraction of practical artificial intelligence computations. Efficient inference is even more challenging - models with hundreds-of-billions of parameters are expensive to use. We continue by discussing model compression and optimization techniques such as fine-grained sparsity as well as quantization to reduce model size and significantly improve efficiency during inference. These techniques may eventually enable inference with powerful models on hand-held devices.
Bio: Torsten Hoefler is a Professor of Computer Science at ETH Zurich, a member of Academia Europaea, and a Fellow of the ACM and IEEE. His research interests revolve around the central topic of "Performance-centric System Design" and include scalable networks, parallel programming techniques, and performance modeling. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, SC19, SC22, EuroMPI'13, HPDC'15, HPDC'16, IPDPS'15, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. He received the IEEE CS Sidney Fernbach Award, the ACM Gordon Bell Prize, the ISC Jack Dongarra award, the Latsis prize of ETH Zurich, as well as both ERC starting and consolidator grants. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.
Robert W. Wisniewski
Attacking the Memory and Communication Wall with Memory Coupled Compute
Abstract: The notion of a "Memory Wall" was identified almost three decades ago. Since, memory has continued to get faster, HBM has been introduced, and computing paradigms have been explored; nevertheless, the memory wall is higher than it was three decades ago. A significant number of classical HPC applications - modeling and simulation applications - are bottlenecked due to insufficient memory bandwidth. More recently the communication wall has received attention since AI applications, which are forming an increasingly important part of HPC, and compute in general, are often bottlenecked because of insufficient communication (node to node) bandwidth. In this talk I will discuss the research we are undertaking to design the hardware and software architecture for HPC and AI applications to tackle these challenges. I will suggest a path forward based on leveraging tightly integrating memory and compute, called Memory Couple Compute (MCC), and describe the interesting design space that needs to be considered to make this architecture a reality. The architectural space is broad, so a key aspect of our investigation involves codesign among application developers, system software, and hardware with key users. A successful effort on this front will produce a MCC capability that has the potential to be the next discontinuity in HPC and AI.
Bio: Dr. Robert W. Wisniewski is a Senior Vice President, Chief Architect of HPC, and the Head of Samsung's SAIT Systems Architecture Lab. He is an ACM Distinguished Scientist and IEEE Senior Member. The System Architecture Lab is innovating technology to overcome the memory and communication walls for HPC and AI applications. He has published over 80 papers in the area of high performance computing, computer systems, and system performance, has filed over 60 patents with 46 issued, has an h-index of 41 with over 7300 citations, and has given over 82 external invited presentations. Prior to joining Samsung, he was an Intel Fellow and CTO and Chief Architect for High Performance Computing at Intel. He was the technical lead and PI for Aurora, the supercomputer to be delivered to Argonne National Laboratory that will achieve greater than an exaflop of computation. He was also the lead architect for Intel's cohesive and comprehensive software stack that was used to seed OpenHPC, and served on the OpenHPC governance board as chairman. Before Intel, he was the chief software architect for Blue Gene Research and manager of the Blue Gene and Exascale Research Software Team at the IBM T.J. Watson Research Facility, where he was an IBM Master Inventor and led the software effort on Blue Gene/Q, which received the National Medal of Technology and Innovation, was the most powerful computer in the world in June 2012, and occupied 4 of the top 10 positions on the Top 500 list.
Katherine A. Yelick
Title: Beyond Exascale Computing
Abstract: The first generation of exascale computing systems are coming online along with powerful new application capabilities and system software. At the same time, demands for high performance computing continue to grow with demands for more powerful simulations, adoption of machine learning methods, and huge data analysis problems arising for new instruments and increasingly ubiquitous devices. With chip technology facing scaling limits and diminishing benefits of weak scaling, it will be increasingly difficult to meet these new demands. Disruptions in the computing marketplace, which include supply chain limitations, a shrinking set of system integrators, and the growing influence of cloud providers are changing underlying assumptions about how to acquire and deploy future supercomputers. At the same time there are discussion around the role of AI and quantum computing. In this talk I’ll present some of the findings of a US National Academies consensus report on the future of post-exascale computing and give my own perspectives on some of the specific challenges and opportunities faced by the research community.
Bio: Katherine Yelick is the Vice Chancellor for Research at the University of California, Berkeley, where she also holds the Robert S. Pepper Distinguished Professor of Electrical Engineering and Computer Sciences. She is also a Senior Faculty Scientist at Lawrence Berkeley National Laboratory. Yelick was Director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and led the Computing Sciences Area at Lawrence Berkeley National Laboratory from 2010 through 2019. She is a member of the National Academy of Engineering and the American Academy of Arts and Sciences, and is a Fellow of the Association for Computing Machinery (ACM) and the American Association for the Advancement of Sciences (AAAS).
Daniel Reed - Presidential Professor, University of Utah
Title: The Future of HPC
Abstract: Our current model for configuring, procuring, and constructing leading edge HPC systems is predicated on a vibrant commercial computing market whose interests and products align with scientific computing needs. Alas, this model is increasingly problematic. First, the PC ecosystem that birthed the “attack of the killer micros” and today’s large-scale HPC clusters is increasingly stagnant, which is in stark contrast to the rapid growth and hardware innovation that is taking place in the hyperscaler cloud and AI markets. Meanwhile, reflecting the technical and financial challenges of a post-Moore environment, the semiconductor industry is shifting rapidly to multiple chip packaging – chiplets that integrate multiple, heterogeneous chips via a high-bandwidth interconnect and package. Finally, AI advances are reshaping how we think about the nature of scientific computation and how we pursue scientific breakthroughs via hybrid computations and data analytics.
Simply put, the scientific computing world now lacks the financial leverage to dictate HPC product specifications at the very high end. The leading edge HPC market is too small, the procurements are too infrequent, the funding is too small, and the financial risk to vendors is too high, while the size and scale of the hyperscaler and deep learning markets are too large to ignore. The message is clear. We must again adapt, just as we did during the transitions from vector systems and shared memory parallel processors. Despite the challenges, there are promising ways forward, and this talk will discuss both the history of HPC and possible directions for the future.
Title: Translational Computer Science
Abstract: Given the increasingly pervasive role and growing importance of computing and data in all aspects of science and society fundamental advances in computer science and their translation to the real world have become essential. Consequently, there may be benefits to formalizing Translational Computer Science (TCS) to complement the traditional foundational and applied modes of computer science research, as has been done for translational medicine. TCS has the potential to accelerate the impact of computer science research overall. In this talk I discuss the attributes of TCS, and formally define it. I enumerate a number of roadblocks that have limited its adoption to date and sketch a path forward. Finally, I will provide some specific examples of translational research and illustrate the advantages to both computer science and the application domains.
Bio: David is a Professor of Computer Science, and currently heads the University of Queensland Research Computing Centre. He has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. From 2007 to 2011 he was an Australian Research Council Professorial Fellow. David has expertise in High Performance Computing, distributed and parallel computing, computer architecture and software engineering. David is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronic Engineers (IEEE), the Australian Academy of Technology and Engineering (ATSE), and the Australian Computer Society (ACS). His hobbies include recreational cycling, photography and making stained glass windows. He is also an amateur playwright, and author of Purely Academic.