The emerging multicore era has brought many opportunities and challenges to systems research. Two of the challenges I have been focusing on are (i) how to provide detailed analysis of parallel programs and (ii) how to map computations in a parallel program to the underlying hardware in order to achieve the optimal performance.
For (i), we have developed the Pin dynamic instrumentation system, which has become very popular for writing architectural and program analysis tools. By inserting instrumentation codes on the fly, Pin can perform fine-grain monitoring of the architectural state of a program. As an example, I will discuss a parallel programming tool called Thread Checker which we built with Pin for detecting common parallel programming bugs like data races and deadlocks. I will also discuss the dynamic compilation techniques behind Pin. In addition, I will present an extension of Pin called PinOS, which performs whole-system instrumentation (i.e. including both OS and applications) by using virtualization techniques.
For (ii), I have developed the Qilin parallel programming system, which exploits the hardware parallelism available on machines with a multicore CPU and a GPU. Qilin provides a C++ API for writing data-parallel operations so that the compiler is alleviated from the difficult job of extracting parallelism from serial code. At runtime, the Qilin compiler automatically partitions these API calls into tasks and maps these tasks to the underlying hardware using an adaptive algorithm. Preliminary results show that our parallel system can achieve significant speedups (above 10x) over the serial case for some important computation kernels.
At the end, I will outline my future works in parallel programming, compilation, and virtualization.