People of ACM - Gabriel Loh

November 13, 2018

Will you describe your role at AMD?

My current primary role is the Principal Investigator (PI) for AMD’s PathForward program. PathForward is an advanced research program that is part of the US Department of Energy’s Exascale Computing Project, which focuses on accelerating the development and deployment of exascale computing systems. I oversee research activities conducted by several teams that are innovating new technologies for CPU and GPU microarchitecture, memory systems, component integration, high-speed interconnects, and more. My role also includes coordinating among various groups such as our collaborators within the DOE, our own AMD Research teams, AMD product planning and technical leadership, and AMD product teams so that we move our research rapidly into advanced development and products. This has been a very exciting and satisfying role, where I have personally learned so much along the way, and I get to help shape the future of technology at AMD and beyond.

As a Fellow within AMD Research, my roles and responsibilities extend beyond my duties as a PI for PathForward. Apart from maintaining an active role in technology innovation, I am involved in a variety of forward-looking projects to research and analyze new technology trends. The analyses performed provide insights and guidance to AMD’s technical leadership and executives on our ever- and rapidly-changing industry. I have also helped to oversee our lab’s patent generation activities, which includes training our researchers on how to better develop their inventions and successfully file for patents; serving on our internal committees to evaluate and select inventions for which we should pursue patents; and writing my own patents. Another role involves interfacing with the external academic research community (specifically, in the field of computer architecture). This includes many of the standard “academic” service roles (e.g., serving on conference program committees), but also university visits to give talks, recruit, and foster collaborations. I have enjoyed these activities, as they allow me to showcase some of the great work that goes on in AMD Research and at AMD as a whole.

Why is die-stacking architecture effective and how do you see it developing in the near future?

With the slowing down of Moore’s Law and the end of Dennard Scaling, the industry as a whole is seeking additional technologies to keep pushing the capabilities of future computing systems. While the industry will continue to extract as much performance (and drive power down) from traditional silicon technologies as possible, die stacking has the potential to help alleviate some of these challenges.

Fundamentally, the Moore’s Law story has been one of integration. With each technology generation, our industry was able to pack in more transistors per chip, which in turn allowed us to integrate more functionality into our processors. Die stacking provides a way to continue integrating more transistors, but it has the potential to be more than that. For example, even if traditional Moore’s Law scaling were to continue, certain types of circuits such as DRAM cannot be easily integrated with the traditional CMOS devices that are used to build processors. However, with die stacking, individual CMOS and DRAM chips can be separately manufactured and then integrated together. The result can be a system with much higher memory bandwidth, more power-efficient memory interfaces (less energy consumed per bit of data transferred), and smaller form factors. At the same time, die stacking is not necessarily a silver bullet. Using the same example, while there may be power and performance advantages to die-stacked memory integration, placing memory so close to a processor has the potential to introduce new thermal challenges.

As a general trend, industry will tend to go after the items that provide the biggest returns for reasonable investments and risks. Die stacking involves much more than putting two or more pieces of silicon together. Some of the challenges include designing appropriate architectures, handling the assembly procedures, developing new testing methodologies, enhancing packaging technology, improving thermals and cooling, managing yields and cost, and more. I do not believe that there are any fundamental “show stoppers” here, but there is still plenty of work to be done. That work costs money, and so the cadence of development and deployment of die stacking will depend on how die-stacked designs translate into product value. The more value a company can generate from a die-stacked product, the easier it is to justify the necessary investments.

That said, for those readers who may be involved in academic research in these topics, I would highly encourage them not to worry too much about the economic and other practical issues of rolling out new technologies. I believe that the research community should be extremely aggressive about how die stacking could be deployed. If radical new ideas can deliver significant advances in performance, reductions in power, or the introduction of new capabilities that change the balance of cost versus value, that could motivate the industry to pursue more aggressive die-stacking solutions on a faster timeline. Don’t worry so much about what industry can build today; tell us what we should build tomorrow!

In addition to die-stacking architecture, what is an emerging area of research in microprocessor architecture that will grow in prominence in the coming years?

It’s not so much an emerging area, but more of a growing trend. I believe that we will see more and more microprocessor architecture research with a very thoughtful and focused application-driven approach. This is not fundamentally new. Graphics processing units (GPUs) were originally an application-specific architecture for rendering visual content. Today, the big deal is machine intelligence, where we are witnessing a range of innovations in application-driven processor architectures to address this rapidly growing area.

As we continue to push harder against the physical limits of silicon scaling, more areas of computing will need to look toward greater degrees of specialization and application-driven optimizations to continue to scale performance and to introduce new capabilities. There are very interesting research questions to be explored regarding the right mix and balance of general-purpose capabilities versus application-specific processing. It is probably too extreme to include hundreds or thousands of specialized processors (e.g., social media processing unit, email spam filtering unit, calendar management unit), but where exactly is the sweet spot and what specific processing capabilities should be supported there?

Related to this, processor architectures to support modular design will be a growing area of research as well. To quickly combine a variety of processors, accelerators, memory devices, and other components into a system targeting specific application needs, new architectures and methodologies will likely be needed. At AMD Research, we have been exploring a variety of techniques to support this type of a modular architecture approach, especially leveraging advanced die-stacking technologies. These explorations are still only in early research stages, but we are already seeing a burst of related research activities in both academic research projects and government research funding programs. There is a lot of exciting and challenging work to be done, but this is part of the joy of working on cutting-edge research projects

You’ve been part of the race to build the world’s first exascale computer, which some observers think could be completed by 2021. At peak performance, an exascale computer could process a billion billion calculations per second. What has been the most significant challenge in developing it?

One of the key challenges to exascale computing is that “exascale” is not a singular concept. Traditionally, supercomputers have been measured by the number of double-precision floating point operations that they can execute per second. However, such measurements are usually made with benchmarks that simply stress the computational throughput of the machine, but the benchmarks don’t necessarily reflect what real scientific applications do. At the end of the day, many of these supercomputers are used by scientists to do, well, science!

This includes a wide range of work from modeling the Big Bang to how proteins fold, or from simulating the properties of new materials to predicting the impacts of climate change. The applications have a wide range of computational, memory bandwidth, memory capacity, and other characteristics, and they all demand superior performance on each new generation of supercomputer. Architecting a successful exascale supercomputer requires developing the right combination of new technologies (both hardware and software) and synthesizing them together in the right balance such that the final machine can satisfy the incredibly diverse computational needs of all of these scientific explorations.

At the same time, this challenge has been immensely satisfying from the perspective of personal growth and broadening my technical horizons. Prior to my involvement with high performance computing at AMD, my research focus (for example, as a professor at Georgia Tech) had been on more commercially-oriented use cases. Working on exascale computing has provided me the opportunity to learn all kinds of new things about high performance computing and scientific applications, general-purpose GPU computing, building reliable systems, hardware-software co-design, and much more. And these are not one-off endeavors; after we hit the exascale milestone, the industry will continue to drive toward systems with even more capabilities (“deca-exascale”?), and the opportunities to research hard problems and continue to innovate will commensurately increase.

Gabriel Loh is a Fellow Research Engineer at Advanced Micro Devices (AMD), the multinational semiconductor company. His research interests include computer architecture, processor microarchitecture, emerging technologies and 3D die stacking. At AMD he is a key technical leader on teams that developed multiple research projects for the US Department of Energy (DOE), including co-leading the DOE PathForward exascale program.

Loh was Co-General Chair of the International Symposium on Computer Architecture (ISCA 2016), and continues to serve on the technical and organizing committees of numerous other ACM conferences. For his contributions to die-stacking technologies in computer architecture, Loh received the ACM SIGARCH Maurice Wilkes Award in 2018, and was named an ACM Fellow in 2017. He is also an ACM Distinguished Speaker.