There is a significant gap between how programmers think about code changes and how change is represented in most change-centric software engineering tools such as diff, CVS, and Unix patch. To bridge this gap, I developed a new program differencing approach that automatically extracts a high-level change description from two program versions. The core of this approach is a novel rule-based change representation that explicitly and concisely captures systematic changes to a program's structure and a rule learning algorithm that automatically infers such rules.
In this talk, I will also present my empirical studies on duplicated code, which partially motivated my program differencing approach. It has been long believed that code clones---syntactically similar code fragments---indicate bad smells of poor software design and that refactoring code clones improves software quality. By analyzing how code clones actually change over time, I found that code clones are not inherently bad and that immediate and aggressive refactoring may not be the best solution for managing code clones.