Patch Tuesday Diffing: CVE-2024-20696 - Windows Libarchive RCE
TL;DR This post will teach you how to patch diff CVE-2024-20696 (and indirectly CVE-2024-20697) from the January 2024 Patch Tuesday. This security patch was interesting as it wasn’t fixing native Microsoft software per se, rather patching an open-source library libarchive used by Windows to support compression and decompression functionality.
Patch diffing is a powerful technique for understanding complex vulnerabilities. It allows you to see clearly. To analyze changes in a binary without relying on external sources and progress from knowing about a CVE to understanding its root cause. Patch Tuesday Diffing is a series of posts that aim to showcase interesting or fun aspects of recent Microsoft Patch Tuesday CVEs. We will use patch diffing to examine these vulnerabilities, sometimes at a surface level and other times in-depth. Each time, we will learn something new. There are many benefits to learning this skill. Hopefully, this post or others in the series will inspire you to give it a go.
According to the Register, Microsoft provided native support to extract .rar
, .7z
, .gz
archive formats last summer.
The Register announcing new compression formats supported by libarchive
Microsoft has signaled it will add native support for tar, 7-zip, rar, gz and “many other” archive file formats to Windows. Redmond’s not cooked up some super-duper decompressor: it’s used the libarchive open source project to pull this off. Source
A quick look at the exports for libarchive from 2018 Windows 10 1809 seems to suggest that they have been using libarchive for some time and might have supported the “new” formats much sooner.
Exports from archiveint.dll from Windows 1809 circa 2018:
1
2
3
4
5
6
7
8
9
10
11
% wget https://msdl.microsoft.com/download/symbols/archiveint.dll/D36DF6F786000/archiveint.dll -O archiveint.dll.x86.3.3.2
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep _rar
311 0x3ab00 _archive_read_support_format_rar@4
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep gz
275 0x269e0 _archive_read_support_compression_gzip@4
288 0x269f0 _archive_read_support_filter_gzip@4
331 0x4c1e0 _archive_write_add_filter_gzip@4
371 0x4c1c0 _archive_write_set_compression_gzip@4
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep 7z
300 0x28d80 _archive_read_support_format_7zip@4
379 0x4d910 _archive_write_set_format_7zip@4
I guess they added a new rar
related function export in archiveint.dll
10.0.22621.2199 which was available August 2023. Maybe this was it?
1
2
3
% objdump --all-headers archiveint.dll.x64.3.6.2 | grep _rar
318 0xccc70 archive_read_support_format_rar
+ 319 0xd10d0 archive_read_support_format_rar5
Regardless of when this support was available, it is here now. Microsoft provides an internally built version of the open-source project. Leveraging open-source is a good move, but as with all Windows software, with a great customer base comes great responsibility.
Microsoft’s internally built libarchive isn’t exactly the same as the Windows binary released by libarchive, but it is close.
Ghidra PE Meta Data Showing A Similiar Number of Functions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
--- archiveint.dll.x64.10.0.19041.3636 Meta
+++ github-3.5.2.archive.dll Meta
@@ -1,44 +1,30 @@
-Program Name: archiveint.dll.x64.10.0.19041.3636
+Program Name: github-3.5.2.archive.dll
Language ID: x86:LE:64:default (3.0)
Compiler ID: windows
Processor: x86
Endian: Little
Address Size: 64
Minimum Address: 180000000
Maximum Address: ff0000184f
-# of Bytes: 694896
-# of Memory Blocks: 10
-# of Instructions: 131686
-# of Defined Data: 5792
-# of Functions: 1778
-# of Symbols: 15300
-# of Data Types: 301
-# of Data Type Categories: 18
+# of Bytes: 904296
+# of Memory Blocks: 9
+# of Instructions: 168608
+# of Defined Data: 10021
+# of Functions: 1751
+# of Symbols: 14925
+# of Data Types: 288
+# of Data Type Categories: 19
Analyzed: true
Compiler: visualstudio:unknown
Created With Ghidra Version: 11.0.1
Executable Format: Portable Executable (PE)
Let’s figure out the changes for Microsoft’s version of libarchive related to this CVE. There are actually two components that make up libarchive in Windows. Both tar.exe
(which aligns to the open-source bsdtar.exe
) and archiveint.dll
(which is libarchive’s archive.dll
) . Only archiveint.dll
was updated for this CVE so we are ignoring tar.exe
.
Patch Diffing CVE-2024-20696
The first thing I like to do when patch diffing a CVE is to do a quick check to see what other CVEs might relate.
Here is the history of libarchive from MSRC’s list of CVEs:
cve | Title | Initial Release | CVSS | Impact | KB Ver Type |
---|---|---|---|---|---|
CVE-2024-20696 | Windows Libarchive | 2024-01-09 | 7.3 | Remote Code Execution | KB5034121-10.0.22000.2713-Security Update KB5034122-10.0.19044.3930-Security Update KB5034122-10.0.19045.3930-Security Update KB5034123-10.0.22621.3007-Security Update KB5034123-10.0.22631.3007-Security Update KB5034127-10.0.17763.5329-Security Update KB5034129-10.0.20348.2227-Security Update KB5034130-10.0.25398.643-Security Update |
CVE-2024-20697 | Windows Libarchive | 2024-01-09 | 7.3 | Remote Code Execution | |
CVE-2021-36976 | Libarchive | 2022-01-11 | 0 | Remote Code Execution |
You can do a quick search in the MITRE CVE database to get an idea of common problems as well. Seems like the open-source libarchive had a use-after-free fixed in CVE-2021-36976 issue back in 2021. This CVE was Microsoft updating libarchive to a new version with the patch. Perhaps this is also the case for CVE-2024-20696?
FAQ providing more detail for CVE-2021-36976
Let’s see what CVE-2024-20696 is all about.
Requirements
For every patch diff, you will need two binaries to compare.
Specifically, you need:
- vulnerable binary
- patched binary
- SRE tooling
- symbol information (if available)
Identifying the Binaries
One approach to finding which binary matches a CVE is to use the CVE description to try and work it out. You can also rely on your past experience, a blog post, or some other insights, but for this CVE the description is enough.
MSRC’s Security Update Guide - Windows Libarchive RCE
Assuming you didn’t already know the binary was archiveint.dll
, we can use the keyword “libarchive” from the description and find a binary with a similar description. If you search through PE binaries in System32, you will eventually find a matching description, or at least one that mentions libarchive.
archiveint.dll Properties match CVE Description
Another indicator that this binary is the one you are looking for is that it was updated for the January 2024 Patch Tuesday. The version 10.22621.3007 from the PE image matches the build for January 9, 2024—KB5034123 .
Downloading The Vulnerable and Patched Version
For Windows OS files there is no better resource than Winbindex when you want to download specific binaries. To make things easier, just use this wget command to get the files.
1
2
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/E9509ED1AD000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3636
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/C9506245ad000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3930
Patch Diffing Using Ghidriff
There are many tools that perform patch diffing. For the “Patch Diffing Tuesday” series, we will try multiple tools to see what we can learn from each. For the first post in the series, we will use ghidriff
. ghidriff
is a command-line patch diffing tool that leverages Ghidra’s powerful programming API and has some (self-proclaimed) amazing features for automating patch diffing. Full disclosure, I wrote this tool to enhance my vulnerability research and to understand patch diffing at a deeper level. If you want to learn more, check out the repo or read this detailed post explaining its origin and features.
For now, let’s see the output from comparing the two binaries. Run ghidriff
on the command line after downloading the vulnerable and patched binaries.
1
ghidriff archiveint.dll.x64.10.0.19041.3636 archiveint.dll.x64.10.0.19041.3930
Ghidriff in a Box (Docker)
Alternatively, for those allergic to Ghidra and Java, you can just try the diff using the docker image.
1
2
3
4
mkdir -p ghidriffs
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/E9509ED1AD000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3636
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/C9506245ad000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3930
docker run -it --rm -v $(pwd)/ghidriffs:/ghidriffs ghcr.io/clearbluejar/ghidriff ghidriffs/archiveint.dll.x64.10.0.19041.3636 ghidriffs/archiveint.dll.x64.10.0.19041.3930
Check out the console output here.
Whether you use the command line, or a docker container, when ghidriff
is complete, you will have a “ghidriffs” directory that looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
% tree -L 2 ghidriffs
ghidriffs
├── archiveint.dll.x64.10.0.19041.3636
├── archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.md
├── archiveint.dll.x64.10.0.19041.3930
├── ghidra_projects
│ └── ghidriff-archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930
├── ghidriff.log
├── json
│ └── archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.json
└── symbols
├── 000admin
├── archiveint.pdb
└── pingme.txt
ghidriff
creates a new Ghidra project, imports and analyzes the binaries, and then finally outputs the diff and resulting json. The resulting diff can be viewed as a Github gist archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.md
Reviewing Results
The markdown output looks like this:
archiveint.dll ghidriff
markdown in vscode
When reviewing the results for a diff you walk through the added, deleted, and modified functions to see if something “pops out”.
Added / Deleted
Here, there are several added functions. Most of the functions start with wil_*
which is a reference to the Windows Instrumentation Library . The several wil_*
functions are functionality differences to the libarchive standard open-source version. For the patched version of Microsoft’s internal libarchive, they enable some telemetry from within libarchive (previously this functionality didn’t exist).
Added NtQueryWnfStateData:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
undefined8
wil_details_NtQueryWnfStateData
(undefined8 param_1,undefined8 param_2,undefined8 param_3,undefined8 param_4,
undefined8 param_5,undefined8 param_6)
{
undefined8 uVar1;
if (g_wil_details_pfnNtQueryWnfStateData == (FARPROC)0x0) {
if (g_wil_details_ntdllModuleHandle == (HMODULE)0x0) {
g_wil_details_ntdllModuleHandle = GetModuleHandleW(L"ntdll.dll");
}
g_wil_details_pfnNtQueryWnfStateData =
GetProcAddress(g_wil_details_ntdllModuleHandle,"NtQueryWnfStateData");
if (g_wil_details_pfnNtQueryWnfStateData == (FARPROC)0x0) {
return 0xc0000139;
}
}
uVar1 = (*g_wil_details_pfnNtQueryWnfStateData)(param_1,0,0,param_4,param_5,param_6);
return uVar1;
}
The wil_details_NtQueryWnfStateData
function added in the patched archiveint.dll
dynamically sets a function pointer to NtQueryStateData at runtime, perhaps to avoid to a compile time link dependency to libarchive (just guessing). The patched binary configures the WNF via wil_details_StagingConfig_Load
with the WNF states:
1
2
"WNF_WIL_MACHINE_FEATURE_STORE_MODIFIED": 0x418a073aa3bc8075,
"WNF_WIL_MACHINE_FEATURE_STORE": 0x418a073aa3bc7c75,
This aligns with our feature flag found below. For more information about WNF or Windows Notification Facility check out these awesome posts Introducing Windows Notification Facility’s (WNF) Code Integrity and Playing with the Windows Notification Facility (WNF)
The diff also reveals that new a feature flag was added. Features are often added to recently patched Windows code so that they can be enable and disabled dynamically. Often when Microsoft patches a function related to a CVE, the a feature toggle (or flag) comes with it. A feature flag in a function you suspect is related to a CVE might indicate you are looking in the right direction.
Modified
This diff has only one modified function. The best result for a patch diff. :)
The function copy_from_lzss_window
is interesting from a security perspective. This CVE is an RCE, which can often mean some type of memory corruption. Taking a look at the at the function we have memory allocations, memcpy
, and size checks.
Pseudo code from copy_from_lzss_window
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
int copy_from_lzss_window(astruct *archive_read,void *buffer,int64_t startpos,int length)
{
// omitted code
// allocate some memory
uVar4 = 0;
_Size = (size_t)length;
lVar1 = *archive_read->field2448_0x9a8;
unp_buffer = *(LPCWSTR *)(lVar1 + 0xd0);
if (unp_buffer == (LPCWSTR)0x0) {
unp_buffer = (LPCWSTR)_o_malloc(*(undefined4 *)(lVar1 + 200));
*(LPCWSTR *)(lVar1 + 0xd0) = unp_buffer;
if (unp_buffer != (LPCWSTR)0x0) goto LAB_180036445;
pcVar5 = "Unable to allocate memory for uncompressed data.";
// memcpy
if (iVar2 < length) {
memcpy(_Dst,_Src,(longlong)iVar2);
_Src = *(void **)(lVar1 + 0x340);
_Size = (size_t)(length - iVar2);
_Dst = (void *)((ulonglong)(uint)(*(int *)(lVar1 + 0xc4) + iVar2) +
*(longlong *)(lVar1 + 0xd0));
}
}
// omitted code
return iVar2;
}
The copy_from_lzss_window
from the vulnerable version doesn’t line up with the open-source code tagged 3.6.2, so we can’t assume the code exactly aligns.
Root cause
When patch diffing a CVE, we get to see for ourselves what changed in the binary. It is awesome. We can’t always find the root cause just from the diff, but we can always learn more. For patch diffing a CVE, we can assume (or hope) the patch fixed a security issue. Sometimes this isn’t the case and the bug isn’t fixed. The primary difference in the function copy_from_lzss_window
for the new version is the introduction of the feature flag and two size checks.
ghidriff
side-by-side diff on diffpreview.github.io
Added Feature Flag and Checks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
--- copy_from_lzss_window
+++ copy_from_lzss_window
@@ -1,65 +1,66 @@
int copy_from_lzss_window(astruct *archive_read,void *buffer,int64_t startpos,int length)
// code omitted
rar = (astruct_1 *)*archive_read->field2448_0x9a8;
+ uVar1 = Feature_3628230972__private_IsEnabled();
+ pWVar5 = (LPCWSTR)0x0;
+ if ((uVar1 == 0) || ((-1 < length && ((uint)length <= rar->unp_buffer_size)))) {
pWVar2 = rar->field202_0xd0;
if (pWVar2 == (LPCWSTR)0x0) {
pWVar2 = (LPCWSTR)_o_malloc(rar->unp_buffer_size);
rar->field202_0xd0 = pWVar2;
if (pWVar2 == (LPCWSTR)0x0) {
pcVar7 = "Unable to allocate memory for uncompressed data.";
uVar4 = 0xc;
goto LAB_18003732a;
}
}
// code omitted
Related function in Github
Speculating the Root Cause
If the Feature_3628230972
is toggled off (uVar1 = 0
), the checks are ignored and the original unpatched code runs. Otherwise the, length
field is checked to ensure that it is smaller than rar->unp_buffer_size
used for the allocation later used in memcpy
.
Added Size Check 1
1
(uint)length <= rar->unp_buffer_size
Size Check 2 ensures that the int length
passed into lxss_copy_from_window
is non-negative.
Added Size Check 2
1
(-1 < length)
When you cast a negative int
to an unsigned int
, that value becomes quite large. The issue is that the value of the length
is used for the memcpy
but is cast to an unsigned int
.
Alternative: Root Cause with a Quick Google Search
One of the best thing to do with a function you don’t know much about is google it and see what turns up. Well, in this case, we immediately come back with a lead.
Old Github issue #521 invalid read in copy_from_lzss_window
An old CVE in the same function reported in 2015 caused an out-of-bounds read. This CVE was fixed in libarchive 3.2.1. This isn’t that same issue we are seeing with this patch, but it is useful. The Github repo even provides a malformed rar
test case. A test case is useful because it can lead us down the path of dynamic analysis.
Dynamic Analysis
Something fun we can do in this case is try the broken rar
file to get to the function. I wrote a quick demo program that will allow you to load the vulnerable archiveint.dll
based on this test case. Well, I had a bit of help getting started. Only two changes were needed to make the Copilot output compile.
Copilot providing a quick test harness base
I then added the functionality from the broken test case to exercise the vulnerable code path.
If you want to play around with this go ahead and clone the repo at https://github.com/clearbluejar/CVE-2024-20696. Set a breakpoint on archiveint.dll!copy_from_lzss_window
. Set r9
to something really large. See what happens.
Attempting the broken RAR file
Using the test case, we can hit the vulnerable function archiveint.dll!copy_lzss_window
.
WinDbg breaking on the vulnerable function
Now using my libarchive-harness-win.exe
we can hit the same path.
Hitting copy_from_lzss_window
in Visual Studio
The 2015 CVE test case doesn’t trigger the issue patched in our CVE, but a quick tweak in the debugger can demonstrate what would happen if you could control the length
parameter. When you set length
to a negative int $r9 = 0xF00000000
it will be sign extended here when cast to a uint64
to 0xFFFFFFFF00000000
and become a very large integer. That integer is used in the memcpy
Failing Size Check 2
1
2
3
4
5
// length is cast to unsigned int
_Size = (size_t)length;
// What could go wrong?
memcpy(_Dst,_Src,0xFFFFFFFF00000000);
The resulting call stack was actually an exception on the read, not quite RCE, but close? Besides, this CVE “exploitability” was rated “less likely”.
In the patched version, setting r9
to a large unsigned int is handled properly and the parsing of the rar
test case just results in an error. I really want to build a rar
to demonstrate the issue without modifying registers in the debugger, but the complexity of a creating a working rar
file is prohibitive. Perhaps we can fuzz it?
WinAfl hasn’t come up with it yet. Maybe check back in a few days? :)
Kicking off WinAFL with libarchive-win-harness.exe
Wrapping Up
Well, that concludes our patch diff of CVE-2024-20696 and our adventure with libarchive. Microsoft’s internal build of libarchive (archiveint.dll
) is now protected against a potential memory corruption from a suspect RAR file with an invalid length parameter.
The diff was interesting because we could compare it against the open source version, had previous CVE examples, and had only one modified function. Also, we determined several subtle differences in Microsoft’s libarchive implementation of the open-source repo.
CVE-2024-20697 is another CVE in a different version of libarchive (3.6.2/Win11 rather than 3.5.2/Windows 10). Try it. See what you come up with. I posted a result here if you want to compare.
One last thought, isn’t it interesting that Microsoft isn’t using the latest version of libarchive. What about the security issues found in later versions? 🤔
This concludes the first post in the Patch Tuesday Diffing Series. This time we tried ghidriff
for diffing and we will try more as we go. If you like it, don’t forget to send ghidriff
a ⭐️ on its Github repo.
Going Deeper
I hope this post inspires you to leverage the power of patch diffing to see a bit deeper into next month’s Patch Tuesday. If you want some free tips on patch diffing using native Ghidra check out my free online tutorial.
CVE North Stars - Ghidra Patch Diffing Tutorial
*A CVE provides a compass of sorts that orients and guides a researcher towards a deeper understanding of the patched vulnerability and its vulnerability class. The idea is to treat CVEs as North Stars in vulnerability discovery and comprehension.**
https://cve-north-stars.github.io/
Patch Diffing In The Dark - Blackhat Training USA 2024
Patch Diffing In The Dark Blackhat USA 2024
If you prefer to get more hands on, consider my upcoming Blackhat training “Patch Diffing in the Dark: Binary Diffing for Vulnerability Researchers and Reverse Engineers”. In this in-person course, I will teach you how to leverage patch diffing to kick start your vulnerability research on both the Windows and Android platform. Together we will learn how to reverse engineer CVEs. We will hunt, find, and root cause several modern vulnerabilities.
The 2-day course is available on Saturday/Sunday or Monday/Tuesday. Early bird pricing is still available, but seats are limited – secure your spot today!