Ghidriff: Ghidra Binary Diffing Engine
TL;DR As seen in most security blog posts today, binary diffing tools are essential for reverse engineering, vulnerability research, and malware analysis. Patch diffing is a technique widely used to identify changes across versions of binaries as related to security patches. By diffing two binaries, a security researcher can dig deeper into the latest CVEs and patched vulnerabilities to understand their root cause. This post presents ghidriff
, a new open-source Python package that offers a command-line binary diffing capability leveraging the power of the Ghidra Software Reverse Engineering (SRE) Framework with a fresh take on the standard patch diffing workflow.
ghidriff
is a side project that evolved into a powerful patch diffing tool for vulnerability research. I have had the privilege to present it at several conferences this year and the slides are available here. It is about a year in the making and publicly released in October. Coincidentally, BinDiff decided it was also time to open-source (after ~20 years!) a few weeks before. Also, another tool qbindiff
came out just a day later! So, in the now sea of binary diffing tools, ghidriff
throws its hat into the ring.
This post shows off some of its key features and might give you a reason to give it a try. Its primary advantages are the fact that it is command-line (lending towards automation) and its output is markdown (enabling something I’m branding “social diffing”). Additionally, ghidriff
, built for patch diffing, structures the markdown to enable a quick review of changes which can highlight interesting security related changes.
Let’s get started. To skip all the background and head straight to its features straight to its features click here. Otherwise, read on and learn a little more about binary diffing, why it’s useful, and why we have yet another diffing tool.
History
There are many tools that perform binary diffing, and do it quite well. The most well known being BinDiff, followed by others such as Diaphora (paired with IDA) and Version Tracking (built into Ghidra). I have the most experience with Ghidra Version Tracking, but have often used BinDiff as a sanity check while developing ghidriff
. There is a good reason for that BinDiff is the most often used and well known. Its origins date back to 2004 from Halvar Flake and Rolf Rolles and their research on binary comparison. Halver’s epic paper Structural Comparison of Executable Objects, introduced a novel method to find matching functions between two binaries.
This paper presents a novel approach which corrects the above mentioned asymmetry: Given two variants of the same executable A called A’ and A’‘, an one-to-one mapping between all the functions in A’ to all functions in A’’ is created. The mapping does not depend on the specific assembly-level instructions generated by the compiler but is more general in nature: It maps control flow graphs of functions, thus ignoring less-aggressive optimization such as instruction reordering and changes in register allocation. Structural Comparison of Executable Objects
Bindiff introduced a fast and reliable means to way to identify functions across two different versions of a binary that detect logic changes in a function rather simple byte changes or compiler optimizations. Instead of focusing on the actual bytes of a function (which can vary with a small source code change), his paper points us toward using a structural comparison of functions to find matches. The gist being that the structure of a function remains unchanged unless the logic of the function changes.
In 2004, I started a company called zynamics that built highly specialized reverse engineering tools. […] One of our products (BinDiff) became an industry standard and even a verb. In 2011, Google acquired the company, and I spent the next 5 years integrating our technology and team into Google. Halvar Flake
Zynamics (now owned by Google) implemented this comparison method into BinDiff, along with several techniques to make BinDiff a revered binary diffing tool.
Purpose
Each binary diffing tool has the same purpose. Find the added, deleted, and modified data and functions between two binaries or minimally, match functions between two binaries.
With this functionality the diffing tool can be used for:
- Reverse Engineering: Port previous reverse engineering work to new binary (ie. bring over comments, labels, and other metadata considered)
- Vulnerability Research: Determine whether or not a security update actually addressed the root issue, or was a shallow fix.
- Malware analysis - find similar code in one malware set to another, or correlate new versions of malware to old
Complexity
Matching functions across binaries is a difficult. A simple diff program won’t find all the differences. I know this, because I’ve tried. You aren’t comparing two text documents; you are comparing two binaries with 1000s of functions and complicated relationships. Not only is it non-trivial, the problem, as noted in Halvar’s paper, is asymmetric.
It takes relatively little work to change source code and recompile, while the analysis of the object code will have to be completely redone to detect the changes. Structural Comparison of Executable Objects
Several changes can occur from a simple change in the source of a binary:
- Registers used to hold specific variables (using RAX instead of RDX)
- Basic block arrangement and branches (flowgraph and callgraph respectively)
- Compiler might optimize instructions that perform the same operations (
xor eax, eax
ormov eax,0
)
Binary diffing tools attempt to bring symmetry to the asymmetric problem of reverse engineering changes between two binaries.
Binary Diffing Tools - Under the Hood
Stand on the Shoulders of Giants (SRE tooling)
All diffing tools I know of rely on modern SRE toolsets (like Ghidra, IDA, or Binja) to distill a complex binary into a list of symbols, functions, basic blocks and their inter-relationships.
Standing on the Shoulders of Giants - ghidriff
on Ghidra
SRE tools catalog all the functions, data, types, and references to enable the programmatic analysis needed to compare two binaries.
- IDA analyzes a binary and stores its analysis in its proprietary database. Diaphora exports the IDA database to a SQLite format that it can later query for function matching.
- Ghidra analyzes a binary and saves offs the analysis in its proprietary database. Version Tracking in Ghidra runs in Ghidra, and simply queries its database directly.
- BinDiff is a stand-alone tool that offers export plugins for SRE tooling. One for IDA, Ghidra, and BinaryNinja. These plugins extract the analysis information into protocol buffers, which is a language neutral way to serialize the structured binary data and functions. Later, the BinDiff visualizer app can use this data for function matching.
Function Matching
Each tool can then analyze every function from each binary and compare them. The comparison algorithm, or means of matching the two functions from different binaries, has a few different names depending on the tool. BinDiff calls them matching algorithms, Diaphora calls them matching heuristics, and Ghidra calls them correlators. No matter the name, they are all algorithms that take various inputs (basic blocks, data, functions) and create an association between two binaries and score them based on some heuristic. Each heuristic uses one or a combination of the methods for matching a function.
3 General Methods for Matching Functions
I know of 3 different methods of matching two functions. Each method has its pros and cons.
- Syntax - compare representation of actual bytes or sequence of instructions
- Pros - Quick. Easy to compute, just hash the two binaries
- Cons - Not generally realistic. Compiling the same source twice with the same compiler will generate a different hash, as this naive approach doesn’t consider the time based metadata put into a binary by the compiler.
- Semantics - compare the meaning is equivalent or provide the same functionality, or has similar effects
- Pros - Less susceptible to metadata or simple compiler changes.
- Cons - To prove two functions are semantically equivalent basically boils down to the halting problem.
- Structure - a blend of semantic and syntax. Analyzes graph representations of binaries ( control flow, callgraphs, etc. ) and computes similarity on these generated structures
- Pros - More general. A control flow is a type of semantic, and easier to compute.
Source: Ghidra Patch Diffing
Syntax-based function matching heuristics are quick and accurate, but cannot handle minor changes or compiler optimizations.
Syntax Matching Heuristic - ExactBytesFuncotinHasher
from VT
I don’t know of any semantic based heuristics, but perhaps the pseudo-code output of the decompiler could represent “meaning”. Structure-based heuristics look at the edges and nodes of the callgraphs or control flow graphs of various functions.
Structural Comparison CFG from BinDiff
Each diffing tool provides several function matching heuristics:
Tool | Matching Name | Code |
---|---|---|
BinDiff | Algorithms | Code |
Diaphora | Heuristics | Code |
Version Tracking | Correlators | Code |
ghidriff | Correlators | Code |
Finding Added, Deleted, and Modified
All the tools look to provide you with the list added, deleted, and modified functions.
Classification | old binary | new binary |
---|---|---|
Deleted | Unmatched | X |
Added | X | Unmatched |
Modified | Matched but different | Matched but different |
Functions found in the old binary but not in the new are assumed to be deleted. Functions unmatched in the new binary are determined to be new. And functions that match, but have slight differences are identified as modified.
Once you classify all the functions, you can start deducing some interesting changes.
Sometimes new functions are added to address security issues:
Interesting security related functions added to Table of Contents (TOC)
Sometimes functions are removed because they are less secure, and replaced with safer APIs:
Example fixing an unsafe (potential TOCTOU in file path) API call
The point is to filter out the noise (everything that is the same) and focus on the signal (everything that has changed).
Now we know what binary diffing is and how diffing tools can be useful. Why then do we need another one?
YADT (yet another diffing tool)
ghidriff
like a phoenix rising from the ashes
Well, I can’t speak for everyone, but I can at least speak as to why I needed one (and why you might). ghidriff
was born of curiosity of necessity. I’ve been patch diffing since 2020. I learned how to use Version Tracking in Ghidra, and was impressed with the power of patch diffing for understanding the exact changes made to a binary. My adventures with patch diffing with Ghidra are detailed in CVE North Stars, where I teach you how to manually load two binaries in Ghidra and perform a patch diff with Version Tracking GUI. For a one-off project, that workflow is fine, but after doing that several (40?) times, I was tired of loading a binary in Ghidra and clicking through the links. Also, the output for the diff was locked into a Ghidra Match Table, making it difficult to share. I was taking the output and copying-and-pasting into vscode
to use their standard diff view.
I found a repetitive and tedious task, and every part of the developer within me told me it was time to automate. ghidriff
is my attempt to provide some of these features to overcome common patch diffing pain points for myself and the community.
Curiosity
I had read about the function matching algorithms, and saw their effect when using the different correlators in Ghidra’s Version Tracking. Writing my own differ gave me a chance to write my own correlators, apply some lessons learned in research, and see my correlations effectively find function matches. For ghidriff
I implement several correlators from Ghidra, some function matching ideas from BinDiff, and some custom matching algorithms. The process of writing my own tooling helped me appreciate the complexity of the binary diffing problem and sharpen my skills with the Ghidra Program API. You can see the evolution of my learning about diffing functions in the several Implementations section of the ghidriff
README.
Necessity
Command Line
I do a lot of patch diffing. I often approach a new research problem by studying the CVEs related to the new software project. When diffing several binaries, the process of diffing using the available tools is a bit tedious. Most involve some workflow like this.
- Obtain two binaries to diff (patched and vulnerable)
- Create a new SRE project
- Import old and new binary
- Run analysis on binaries
- Export the analysis for each file
- Open up the exported analysis in the corresponding diff tool (or within the SRE tooling)
- Analyze the diff
This is too many steps if you want to perform any more than say, 10 patch diffs. I needed a command-line tool.
I wanted to run a single command and point the tool at 2 binaries.
1
ghidriff old.dll new.dll
ghidriff
changes to the entire workflow to 3 steps.
- Download two binaries to diff
- Run
ghidriff
binary_old.dll binary_new.dll - Analyze the diff
Decompilation Diffs
While looking at the assembly diff can help in understanding the particular vulnerability, doing this across the 20 functions that have changed for your diff gets a bit out of hand. I needed psuedo-code diffs (aka decompilation diffs).
Every time I performed a patch diff in Ghidra’s Version Tracking I noticed two things. Version Tracking could find the differences well. A major issue though it that VT didn’t have a diff for the decompiled output between two functions, and I always found myself copy-and-pasting the decomp from each function in both binaries and comparing them manually in vscode
.
Manually Comparing Psuedo-code in Vscode
That quickly got old. What if I could automate that?
Sharing Results- Markdown All the Things
Most of the existing diffing tools don’t have an easy way to share results. They might be able to export an image, or save off some database, but no easy way to send a link to a friend and say hey check this out. I wanted the diffing results to be easily shareable, or “social” even. I have several projects where the output is markdown and I like the results.
Markdown All The Things from callgraphs-with-ghidra-pyhidra
I wanted the diff output to look good in a blog post or in a gist on Github. Markdown is everywhere and when combined with another ubiquitous library mermaidjs
you can produce some pretty interesting looking outputs.
Hello Ghidriff
Say hello to ghidriff
.
ghidriff
provides a command-line binary diffing capability with a fresh take on diffing workflow and results.
This project, developed over the course of a year, has improved and evolved over time. I started with simple diffing, eventually added to structural matching and several other correlations, and have arrived at a version that works quite well.
It leverages the power of Ghidra’s ProgramAPI and FlatProgramAPI to find the added, deleted, and modified functions of two arbitrary binaries. It is written in Python3 using
pyhidra
to orchestrate Ghidra andjpype
as the Python to Java interface to Ghidra.
Its primary use case is patch diffing. Its ability to perform a patch diff with a single command makes it ideal for automated analysis. The diffing results are stored in JSON and rendered in markdown (optionally side-by-side HTML). The markdown output promotes “social” diffing, as results are easy to publish in a gist or include in your next writeup or blog post. ghidriff README
High Level
flowchart LR
a(old binary - rpcrt4.dll-v1) --> b[GhidraDiffEngine]
c(new binary - rpcrt4.dll-v2) --> b
b --> e(Ghidra Project Files)
b --> diffs_output_dir
subgraph diffs_output_dir
direction LR
i(rpcrt4.dll-v1-v2.diff.md)
h(rpcrt4.dll-v1-v2.diff.json)
j(rpcrt4.dll-v1-v2.diff.side-by-side.html)
end
At a high level, ghidriff
accepts two binaries as input and outputs Ghidra projects files (which can be used in the Ghidra GUI later) and the diff output.
The heavy lifting of the binary analysis is done by Ghidra and the diffing is made possible via Ghidra’s Program API. Ghidra translates the raw binary into objects and relationships that ghidriff
can compare. ghidriff
provides a diffing workflow, function matching, and resulting markdown and HTML diff output. It’s code is relatively simple and consists of ~10 source files within the source package in ghidriff.
ghidriff
Python Source Package
There is much more to explain on Ghidra internals, but that will have to wait for another post. For now, we will skip straight to reviewing its features. To really understand how useful it is, you are going to have to go try it for yourself.
Features
Command Line
ghidriff
allows you to perform the entire patch diffing workflow with a single command.
ghidriff old.bin new.bin
It has quite a few options, but the defaults should work for most cases. It also provides the command line used to generate the diff.
Command-line used to generate a report is included in results and sometimes download links
While ghidriff
will support any platform that can be analyzed by Ghidra, if Windows binaries are being diffed, then it even gives you the direct download links from Microsoft’s MSDL servers.
Summary TOC
The entire markdown diff is created dynamically using the JSON results generated by ghidriff
. If we take a look at a full Windows 11 22H2 kernel diff from last October’s patch Tuesday (KB5031354), the TOC give us a quick picture of how the kernel has changed with the latest security update.
ntoskrnl.exe
Windows 11 22H2 Update - KB5031354
Visual Diff Charts
MermaidJS provides the ability to show some great visualizations to help us understand how the functions have changed across versions.
MermaidJS flowchart rendered from GitHub’s markdown renderer
With pie charts:
Match Types Pie Chart showing which matching algorithms were used
Added, Deleted, Modified Ratios
Metadata Diffs
The Binary Metadata Diff section shows you differences between binary properties like the compiler detected, architecture, and the number of symbols and functions.
Strings diffs help provide insight to newly introduced strings for the patched binary.
Function Diffs
Each discovered modified function will generate some metadata about it. The ratios show how similar they are based on instructions (i_ratio), mnemonics (m_ratio), or basic block mnemonics (b_ratio). The match_types tell you which correlator(s) was used to match the function.
Function Meta Diffs show you the difference between reference count, or called functions, or several other properties of the function.
Finally, the pseudo-code unified diff for VrpBuildKeyPath
shows the decompiled code diff from the modified function:
Side by Side Diffs
Side by side diffs are also possible by adding the command-line flag --sxs
. This will generate a markdown diff with an HTML table showing the side-by-side view.
Issue: GitHub Markdown doesn’t support HTML style
This works well if you publish the markdown on your own blog or report, but the side-by-side HTML will break if you attempt to display it in GitHub. GitHub doesn’t allow you to modify the HTML style and it simply displays it as text
HTML Style Not Rendered in GitHub
As a result I built a workaround. If you are familiar with the simple HTML preview sites that allow you to view HTML stored in Github (or really anywhere), you will recognize immediately how this works.
Diffpreview.github.io
You can go and see for yourself this site works. To use diffpreview.github.io follow this process:
- Create diff with
ghidriff
.ghidriff ntoskrnl.exe.10.0.22621.2361 ntoskrnl.exe.10.0.22621.2428
- Put the resulting markdown in a gist.
https://gist.github.com/clearbluejar/58af23c6b17eefae87608ef2d67d22d7
- Copy the gist ID and paste it into:
Copy/Paste the GistID into diffpreview.github.io
Link to function with side-by-side: [VrpBuildKeyPath](https://diffpreview.github.io/?58af23c6b17eefae87608ef2d67d22d7#d2h-647798:~:text=ed8-,VrpBuildKeyPath,-(undefined8%20*param_1%2C) gives us this view:
VrpBuildKeyPath
side-by-side HTML view in diffpreview.github.io
Social Binary Diffing
If GitHub can provide “social coding”, ghidriff
can provide “social diffing”. Since the diff output is in markdown, you can publish the diff wherever markdown is supported. All the sections within the markdown are deep linked, which is great for sharing and pointing out specific areas of interest. Here is an example of social diffing with CVE-2023-38140 from a recent post on Twitter. As each function is a deep-link, you can highlight (with deep-links) to the functions of interest.
Usage - Diffing Kernels, CVEs, Cross-Architecture
Now that we know some of the features, let’s walk through some use cases for the tool.
Diffing a full Windows Kernel
Each month I have been trying to post the latest diff of the Windows Kernel to highlight how simple it is to diff with ghidriff
. Try these steps.
Download two versions of the kernel (older and patched binary):
1
2
wget https://msdl.microsoft.com/download/symbols/ntoskrnl.exe/F7E31BA91047000/ntoskrnl.exe -O ntoskrnl.exe.10.0.22621.1344
wget https://msdl.microsoft.com/download/symbols/ntoskrnl.exe/17B6B7221047000/ntoskrnl.exe -O ntoskrnl.exe.10.0.22621.1413
Run ghidriff:
1
ghidriff ntoskrnl.exe.10.0.22621.1344 ntoskrnl.exe.10.0.22621.1413
Analyze the Diff
The result of a full Windows kernel diff (which has about 32,000 functions) results in this beautiful concise markdown file: ntoskrnl.exe.10.0.22621.1344-ntoskrnl.exe.10.0.22621.1413.ghidriff.md
Again, to see the the side-by-side, use diffpreview
: https://diffpreview.github.io/?b95ae854a92ee917cd0b5c7055b60282
Diffing CVE-2023-2342
The kernel update had a security flaw. See if you can figure out what function was patched for CVE-2023-2342.
- Details of CVE-2023-2342 can be found here: https://bugs.chromium.org/p/project-zero/issues/detail?id=2392
Answer:
Diffing CVE-2023-21768
Let’s try another example by patch diffing a CVE. We are going to use a perfect example CVE. By perfect I mean that the binary only had a single line change in the entire binary, pointing directly to the root cause! Complete details of the CVE-2023-21768 are available in this blog post).
Lets repeat this patch diff with ghidriff
.
- Download two versions of
AFD.sys
(vulnerable and patched):
1
2
3
wget https://msdl.microsoft.com/download/symbols/afd.sys/0C5C6994a8000/afd.sys -O afd.sys.x64.10.0.22621.608
wget https://msdl.microsoft.com/download/symbols/afd.sys/DE5438E9a8000/afd.sys -O afd.sys.x64.10.0.22621.1105
ghidriff afd.sys.x64.10.0.22621.608 afd.sys.x64.10.0.22621.1105
- Run
ghidriff
:
1
ghidriff afd.sys.x64.10.0.22621.608 afd.sys.x64.10.0.22621.1105
- Review results
In the table of contents we can see the list of modified functions:
CVE-2023-21768 Diff Table of Contents
Which lines up from what we learned from the blog post!
Vulnerable Function Identified in Blog
The vulnerable function AfdNotifyRemoveIoCompletion
was identified here with a single line change. The “Modified (No Codes Changes)” section helps you quickly see that something about the external function ProbeForWrite
changed. In this case it was the number of references.
Modified (No Code Changes) showing ref count change
The diff results are posted in this GitHub gist and immediately available in a side-by-side view or jump to the single line change
Diffing Across Architectures
I recently posted on twitter an example of diffing across architectures. There is nothing stopping you from doing so. Although if you try to do this with ghidriff
it will warn you. You will need to add the flag --force-diff
to the command-line to proceed.
Need to --force-diff
to diff different architectures
The result is not as accurate, but still handy if you only have binaries of different architectures to compare.
ntoskrnl.exe
diff across architectures
These are just some fun examples. I plan to keep posting similar ones on social.
Conclusion
This post covered ghidriff
, a new open-source Python package that offers a command-line binary diffing capability leveraging the power of the Ghidra Software Reverse Engineering (SRE) Framework with a fresh take on the standard patch diffing workflow.
Like other binary diffing solutions, ghidriff
relies on SRE tooling to distill complex binaries into objects and relationships that can be compared. Unlike other tools, ghidriff
offers a command-line experience, simplifying the entire patch diffing workflow to only a single step, significantly reducing analysis time. Additionally, the markdown diff output can be shared on GitHub, GitLab, blogs, or almost anywhere.
This tool has been a game changer for me trying to stay on top of recent updates and researching CVEs. Its public release is me “throwing my hat” into the binary diffing ring and a chance to give back to the community. While no binary diffing tool is a silver bullet, ghidriff
might just give you a head start on your next vulnerability hunting adventure.
Let me know your thoughts on social or discord. I’ve got some features coming down the line which I’m excited to share here soon. Stay tuned. If you find issues using the tool, submit an issue. If you like it, send ghidriff
a ⭐️ on its Github repo.
Cover photo by Garett Mizunaka Unsplash