For almost two decades, the IMPACT research group at the University of Illinois has been working towards architectural models and compiler techniques for ultra-efficient computing platforms. The first phase of this effort involves a combination of IMPACT compile-time parallelization, ROAR dynamic optimization, and Flea-Flicker multipass pipelining to exploit instruction-level parallelism and tolerate cache-miss latency to achieve high performance, low power execution of applications. This phase has resulted in numerous publications and contributed to the Intel IPF compilers and processors. I will begin the talk by summarizing some important lessons we learned in the process, many derived from experiments using real hardware. More recently, due to stringent power constraints, semiconductor computing platforms are converging to a model that consists of multiple processor cores and domain-specific hardware accelerators.
From the hardware perspective, these platforms provide great opportunities in further advancement of performance and power efficiency. The primary obstacle to the success of this model is the level of difficulty in programming and compiling for these platforms.
Although high-performance FORTRAN compilers and the OpenMP API provide a strong technology base for the task, major roadblocks remain in compiling for popular implementation languages such as C/C++. I argue that most exciting applications that will drive the continued performance scaling of semiconductor computing platforms are those that reflect some aspects of the physical world: sound, images, chemical reactions, and physical forces. These applications often have abundant parallelism in their algorithms. The parallelism is, however, obscured by the implementation process.
In the second part of the talk, I will review some key lessons from our study of MPEG-4 (video), JPEG (image), and LAME (sound) based on the second generation IMPACT compiler technology. These applications exhibit several characteristics that distinguish them from previously studied scientific FORTRAN applications: extensive use of pointers in memory accesses, command line options that impose diverse constraints on code transformations, dynamic memory allocation that obscures targets of memory accesses, interprocedural value flow that makes loop analysis more difficult, and multiple layers of function calls with complex control flow involved in the desired units of parallel execution. I will outline research efforts in the GSRC Soft Systems thrust to develop scalable, deep program analysis and deep code transformation techniques as well as domain specific programming models to overcome these obstacles. I will also describe related GSRC work on Linux enhancements that enable seamless integration of hardware accelerators into the Linux software stack for use by driving applications.