Key Words:
Binary Analysis, Variable Recovery, Probabilistic Analysis, Reverse Engineering
Abstract:
     Recovering variables and data structure information from stripped binary is a prominent challenge in binary program analysis. While various state-of-the-art techniques are effective in specific settings, such effectiveness may not generalize. This is mainly because the problem is inherently uncertain due to the information loss in compilation. Most existing techniques are deterministic and lack a systematic way of handling such uncertainty. We propose a novel probabilistic technique for variable and structure recovery. Random variables are introduced to denote the likelihood of an abstract memory location having various types and structural properties such as being a field of some data structure. These random variables are connected through probabilistic constraints derived through program analysis. Solving these constraints produces the posterior probabilities of the random variables, which essentially denote the recovery results. Our experiments show that our technique substantially outperforms a number of state-of-the-art systems, including IDA, Ghidra, Angr, and Howard. Our case studies demonstrate the recovered information improves binary code hardening and binary decompilation.
Similar to the standard in the literature (e.g., Howard), we inspect individual variables on the stacks and heaps (including structure types). If it is a pointer type, we inspect the structure that is being pointed to. For example, if a (Socket *) variable is recovered as (void *), we consider it incorrect. We say it is correct only if the variable is recovered as a pointer pointing to a structure homomorphic to Socket. We only consider the functions covered by BDA.
The overall recall and precision are shown in Fig.1 and Fig.2, respectively. As we can see, Osprey 🦅   achieves more than 88% recall, and more than 90% precision, outperforming the best of other tools (i.e., Ghidra with around 77% recall and 70% precision).
Fig.3 and Fig.4 present the recall and precision of complex types recovery. Complex types include structures, unions, arrays and pointers to structures and unions. Note that Angr could not recover complex data types, hence we do not list its results on the figures. Observe that the recall of Osprey 🦅   is around 74%, more than 2 times higher than Ghidra and IDA Pro. The precision of Osprey 🦅   also outperforms Ghidra and IDA Pro.
To better quantify our results on complex variables, we construct a syntax tree for each complex type (with fields being the child nodes). Nesting structures and unions are precisely modeled, and any inner nesting structure or union type without outer references are ignored. Cycles are removed using a leaf node with a special type tag. We then compare the edit distance of the recovered trees and the ground-truth trees. We compute tree difference that is defined as the ratio of the tree edit distance (i.e., the minimum number of tree edits that transform a tree into another) and the whole tree size. The smaller the tree difference, the better the recovery result. Fig.5 shows the results. Overall, Osprey 🦅   has the minimal tree difference, which is 2.50 and 2.18 times smaller than Ghidra and IDA Pro.
Detailed data can be find here.
We also perform side-by-side comparison with Howard. Since Howard is not available (after communicating with its authors), we use the data reported in the paper. We use the same setup and report results in the same metrics.
Program | Accuracy (% of variables) | Accuracy (% of bytes) | Code Coverage | ||
Osprey 🦅   | Howard | Osprey 🦅   | Howard | ||
wget | 85.32% | 77.03% | 71.58% | 63.12% | 51% |
lighttpd | 87.67% | 86.15% | 85.30% | 82.10% | 55% |
grep | 82.10% | 75.07% | 79.13% | 78.04% | 50% |
gzip | 100.00% | 86.15% | 100.00% | 87.10% | 70% |
fortune | 100.00% | 83.04% | 100.00% | 82.10% | 71% |
Avg. | 91.02% | 81.49% | 87.20% | 78.49% | 59% |
Tab.1 presents the the accuracy of type discovery for stack variables measured in the number of variables (columns 2-3) and in the number of bytes (columns 4-5), and the function-level code coverage of Howard (column 6). We cannot compare with their heap results as Howard measured the accuracy for individual dynamic heap objects.
We measure the execution time of different tools on the two benchmark sets (details are available here).
In general, Osprey 🦅   is 18.57, 88.04, and 50.79 times slower than Ghidra, IDA Pro, and Angr, respectively. We argue that reverse engineering is often a one-time effort and Osprey 🦅   provides a different trade-off between cost and accuracy. It is also worth noting that Ghidra is the second slowest one due to its register-based data-flow analysis, and IDA Pro is the fastest one as its variable recovery mainly relies on hard-coded code pattern matching rules.
To assess the scalability of Osprey 🦅  , we evaluate it on Apache and Nginx, two well-known applications with significantly larger code base than the benchmarks we used.
The results are shown in Fig.6. On both programs, Osprey 🦅   produces the highest F1 Score for overall and complex variable recovery, and the lowest tree difference. Although our tool takes around an hour and a half, we argue that it is a reasonable overhead for binary analysis and reverse engineering.