The multi-core era is irreversibly here. The transition from single-core to few cores has been relatively smooth. However, the unending need for higher performance will bring processors with hundreds and thousands of cores in the market pretty soon. But what are the implications of this to engineering, and software industry in general and computer science in particular? How is industry embracing this change? Are we ready?
One of the challenges that we have been working on is the absence of memory virtualization in many-core architectures. Caches were the most important pillar of computer architecture in the single-core era. Caches provided the illusion of a single large unified memory, and kept programming simple and same. However, caches do not scale well with number of cores, and also consumes a lot of power. Therefore to improve the power-efficiency, and enable large number of cores in a processor, computer architects are in search of alternative memory hierarchies.
Limited Local Memory multi-core architecture is a scalable memory design in which each core has access to only its small local memory, and explicit DMA instructions have to be inserted in the program to transfer data between memories. The IBM Cell processor, which is in the Sony Playstation 3 is a popular example of this architecture. The roadrunner supercomputer, which broke the peta-scale computation record is one of the most power-efficient super-computers, and is made of IBM Cell processors. Such high power-efficiency comes partly at the cost of simplicity of programming. Programming LLM architecture is not simple, as it requires application change. Application developers have to be cognizant of the small size of the local memory, and have to insert instructions to perform this data transfer between the memories. My talk will summarize our efforts at automating this memory management.