Plucking Buggy Inputs Out of Thin Errors: Synthesizing Test Generators via Perturbation Learning

Zhe Zhou, Rob Dickerson, Benjamin Delaware, and Suresh Jagannathan


Given a single witness to a fault in a program (in the form of a buggy input), we often wish to discover related inputs that can also trigger the same fault. This kind of error generalization is important to help document API misuse, better localize faults, provide crucial detail in bug reports, and facilitate data-driven program analyses, verification, and inference techniques that require both meaningful positive and negative inputs to a program. Error generalization is particularly challenging, however, when the identified fault occurs in blackbox components whose source code is either unavailable or too complex to understand or effectively analyze. To facilitate error generalization in such contexts, we present a generative learning-based mechanism that synthesizes error-producing test generators for a program under test given one or more known buggy inputs. Our learned test generators are input perturbations, functions implemented as sequential compositions of datatype operations that transform one erroneous input into another. These perturbations can be thus used to generate additional error-producing inputs from some initial set of buggy inputs. Our results demonstrate that perturbation learning can effectively and systematically generalize from a small set of known errors in the presence of blackbox components, providing significant benefits to data-driven analysis and verification tools.