EECS Seminar: Compiler Support for Structured Data

McDonnell Douglas Engineering Auditorium
Saman Amarasinghe, Ph.D.

Department of Electrical Engineering and Computer Science 
Massachusetts Institute of Technology

Abstract: FORTRAN, the first programming language introduced over a half a century ago, ushered in the era of multidimensional dense arrays commonly referred to as dense tensors. Since then, the programming world has evolved to introduce a plethora of data structures ranging from lists and sets to trees and graphs. Yet, when it comes to handling immense data sets, dense tensors remain a practical mainstay. But here's the twist: most modern tensor data isn't dense. Whether originating from sensors, computational processes or human input, a significant portion of real-world data embodies innate structures such as sparsity, repeated value sequences or symmetry. These characteristics are evident in diverse fields like scientific computing, data analytics, graph processing and machine learning.

In this talk I'll dive deep into how programming languages and compilers are adapting to embrace structured data. I’ll introduce TACO and Finch compilers. TACO pioneered auto-generating of kernels for any sparse tensor algebra operation across prevalent formats. Finch, on the other hand, has seamlessly integrated the management of structured data, capturing nuances like sparsity, repeated values and symmetry. I will demonstrate how to compile compound tensor expressions on structured data into efficient loops in a systematic way and how our compiler's output rivals the performance of top-tier handcrafted codes for matrix and tensor functions. I hope to convince you that we can finally put structured array programming on the same compiler transformation and code generation footing as dense array codes.

Bio: Saman Amarasinghe is a professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of its Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. Under Amarasinghe's guidance, the Commit group has developed a myriad of pioneering programming languages and compilers including the StreamIt, StreamJIT, PetaBricks, Halide, Simit, MILK, Cimple, TACO, GraphIt, BioStream, CoLa and Seq programming languages and compilers; DynamoRIO, Helium, Tiramisu, Codon and BuildIt compiler/runtime frameworks; Superword Level Parallelism (SLP), goSLP and VeGen for vectorization; Ithemal machine learning based performance predictor; Program Shepherding to protect programs against external attacks; the OpenTuner extendable autotuner; and the Kendo deterministic execution system. Amarasinghe was the co-leader of the Raw architecture project. Beyond academia, he was a co-founder of Determina, Lanka Internet Services Ltd., Venti Technologies, DataCebo and Exaloop corporations. Amarasinghe received his B.S. in electrical engineering and computer science from Cornell University in 1988, and his master's degree and doctorate from Stanford University in 1990 and 1997, respectively. He is an ACM Fellow.