Proof assistants based on dependent types are
powerful tools for building certified
software. In order to verify programs written
in a different language, however, a
representation of those programs in the proof
assistant is required. When that language is
sufficiently similar to that of the proof
assistant, one solution is to use a
shallow embedding to directly encode
source programs as programs in the proof
assistant. One challenge with this approach is
ensuring that any semantic gaps between the
two languages are accounted for. In this
paper, we present GSet, a mixed
embedding that bridges the gap between OCaml
GADTs and inductive datatypes in Coq. This
embedding retains the rich typing information
of GADTs while also allowing pattern matching
with impossible branches to be translated
without additional axioms. We formalize this
with GADTml, a minimal calculus that
captures GADTs in OCaml, and gCIC, an
impredicative variant of the Calculus of
Inductive Constructions. Furthermore, we
present the translation algorithm between
GADTml and gCIC, together with a proof
of the soundness of this translation. We have
integrated this technique into Coq-of-OCaml,
a tool for automatically translating OCaml
programs into Coq. Finally, we demonstrate
the feasibility of our approach by using our
enhanced version of Coq-of-OCaml to translate
a portion of the Tezos code base into Coq.