Die stacked 3D DRAM technology can provide low-energy high-bandwidth memory module by vertically integrating several dies within the same chip. However, the size of such 3D memory modules is unlikely to be sufficient to provide the full memory capacity required for typical systems, so future memory systems are likely to use 3D DRAM together with traditional off-chip DRAM. In this talk, I will discuss how such memory systems can efficiently architect 3D DRAM either as a cache or as main memory.
First, I will show that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate hit latency. I will present Alloy Cache, a simple latency-optimized DRAM cache architecture that can outperform even an impractical SRAM Tag-Store design, which would incur an unacceptable overhead of several tens of megabytes.
Finally, I will present a memory organization that allows 3D DRAM to be a part of the OS-visible memory address space, and yet relieves the OS from data migration duties. The proposed CAMEO (CAche-like MEmory Organization) design performs data migration between off-chip memory and 3D DRAM at a line-size granularity, in a manner transparent to the OS. CAMEO outperforms using 3D DRAM only as a cache or only as a OS-managed two-level memory.