Data-driven Lemma Discovery

Zhe Zhou, Rob Dickerson, Benjamin Delaware, and Suresh Jagannathan


Programs often leverage data structure libraries to provide useful and reusable abstractions. Modular verification of these programs naturally rely on specifications associated with these libraries that capture important properties about how these data structures are expected to be accessed and manipulated. However, because these specifications are written without awareness of how they will actually be used in a client-side verification task, they are often incomplete. In particular, specifications rarely expose relevant relational properties among a library's methods and the predicates that comprise their specifications. These properties, while not necessarily relevant to understanding the behavior of the library, are often critical to enabling verification of the client. We refer to these missing specifications as lemmas and, in this paper, propose an automated data-driven procedure for discovering them. Our approach enables modular verification of client programs by driving a learning process using samples and counter-examples from an SMT solver to discover relations among the predicates that define library method specifications. Our learning framework seeks to find lemmas sufficient to discharge verification conditions associated with the client program. We have applied our technique on a collection of challenging data structure programs written in OCaml. Our results provide strong evidence on the feasibility and effectiveness of data-driven lemma discovery.