In recent years, microprocessor manufacturers have shifted their focus from single-core to multicore processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction. Within the scientific computing domain automatic parallelization techniques have been successful, but in the general purpose computing domain few, if any, techniques have achieved comparable success.
Despite this, recent progress hints at mechanisms to unlock parallelism from general purpose applications. In particular, two promising proposals exist in the literature. The first, a group of techniques loosely classified as thread-level speculation (TLS), attempts to adapt techniques successful in the scientific domain, such as DOALL and DOACROSS parallelization, to the general purpose domain by using speculation to overcome complex control flow and data access patterns not easily analyzed statically. The second, a non-speculative technique called Decoupled Software Pipelining, partitions loops into long-running, fine-grained threads organized into a pipeline (pipelined multithreading or PMT). DSWP effectively extends the reach of conventional software pipelining to codes with complex control flow and variable latency operations.
Unfortunately, both techniques suffer key limitations. TLS techniques either suffer from over speculation, in an attempt to speculatively transform a loop into a DOALL loop, or realize little parallelism in practice because DOACROSS parallelization puts core-to-core communication latency on the critical path. DSWP avoids these pitfalls with its pipeline organization and decoupled execution using inter-core communication queues. However, its non-speculative nature and restrictions needed to ensure a pipeline organization prevent DSWP from achieving balanced parallelism on many key application loops.
In this talk, I present two key contributions that advance the state of automatic parallelization of general purpose applications. First, I propose extending pipelined multithreaded execution with intelligent speculation. Rather than speculating all loop-carried dependences to transform loops into DOALL loops, I propose speculating only key predictable dependences that inhibit balanced, pipelined execution. I will present results from our automatic compiler transformation, Speculative DSWP, demonstrating the efficacy of this technique. Second, to support decoupled speculative execution, I will describe an extension to a multi-core architecture's memory subsystem allowing it to support memory versioning. The proposed memory systems resemble those present in TLS architectures, but provide efficient execution in the presence of large transactions, many simultaneous outstanding transactions, and eager data forwarding between uncommitted transactions. In addition to supporting usage patterns exhibited by speculative pipelined multithreading, the proposed memory system facilitates existing and future speculative threading techniques