Increasing number of cores and wide adoption of in-memory applications demand high capacity main memory. Technology scaling of DRAM cells has enabled higher capacity memory for the last few decades. Unfortunately, DRAM cells become vulnerable to failure as they scale down to smaller size. Enabling higher capacity memory systems without sacrificing the reliability is a major research challenge. My work focuses on designing a scalable memory system by rethinking the traditional assumptions in abstraction and separation of responsibilities in two interfaces: (i) circuits and architecture, and (ii) architecture and operating systems.
Traditionally, circuit-level optimizations have ensured reliable DRAM operation. In my work, I enable DRAM scaling without the need for strict reliability guarantees from the manufacturers. I envision manufacturers shipping DRAMs without fully ensuring correct operation, and the system being responsible for detecting and mitigating DRAM failures while operating in the field. However, designing such a system is difficult due to intermittent DRAM failures. In this talk, I will present the challenges of building such a system, show the effectiveness of system-level detection and mitigation techniques, and design an intelligent system capable of providing reliability guarantees even in the presence of intermittent failures.
Leveraging the new highly scalable non-volatile memory technologies is an alternate way to enable high capacity memory. At the end of the talk, I will briefly present my recent work on rethinking the conventional memory and storage hierarchy to take advantage of the non-volatility of these technologies. I will present my vision to redefine the hardware and operating system interface to unify memory and storage system under a single address space and discuss the opportunities and challenges of such a system.