Post

Patch Tuesday Diffing: CVE-2024-20696 - Windows Libarchive RCE

TL;DR This post will teach you how to patch diff CVE-2024-20696 (and indirectly CVE-2024-20697) from the January 2024 Patch Tuesday. This security patch was interesting as it wasn’t fixing native Microsoft software per se, rather patching an open-source library libarchive used by Windows to support compression and decompression functionality.


Patch diffing is a powerful technique for understanding complex vulnerabilities. It allows you to see clearly. To analyze changes in a binary without relying on external sources and progress from knowing about a CVE to understanding its root cause. Patch Tuesday Diffing is a series of posts that aim to showcase interesting or fun aspects of recent Microsoft Patch Tuesday CVEs. We will use patch diffing to examine these vulnerabilities, sometimes at a surface level and other times in-depth. Each time, we will learn something new. There are many benefits to learning this skill. Hopefully, this post or others in the series will inspire you to give it a go.


According to the Register, Microsoft provided native support to extract .rar, .7z, .gz archive formats last summer.

The Register announcing new compression formats supported by libarchive

Microsoft has signaled it will add native support for tar, 7-zip, rar, gz and “many other” archive file formats to Windows. Redmond’s not cooked up some super-duper decompressor: it’s used the libarchive open source project to pull this off. Source

A quick look at the exports for libarchive from 2018 Windows 10 1809 seems to suggest that they have been using libarchive for some time and might have supported the “new” formats much sooner.

Exports from archiveint.dll from Windows 1809 circa 2018:

1
2
3
4
5
6
7
8
9
10
11
% wget https://msdl.microsoft.com/download/symbols/archiveint.dll/D36DF6F786000/archiveint.dll  -O archiveint.dll.x86.3.3.2
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep _rar
     311  0x3ab00  _archive_read_support_format_rar@4
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep gz
     275  0x269e0  _archive_read_support_compression_gzip@4
     288  0x269f0  _archive_read_support_filter_gzip@4
     331  0x4c1e0  _archive_write_add_filter_gzip@4
     371  0x4c1c0  _archive_write_set_compression_gzip@4
% objdump --all-headers archiveint.dll.x86.3.3.2 | grep 7z
     300  0x28d80  _archive_read_support_format_7zip@4
     379  0x4d910  _archive_write_set_format_7zip@4

I guess they added a new rar related function export in archiveint.dll 10.0.22621.2199 which was available August 2023. Maybe this was it?

1
2
3
% objdump --all-headers archiveint.dll.x64.3.6.2 | grep _rar     
	 318  0xccc70  archive_read_support_format_rar
+    319  0xd10d0  archive_read_support_format_rar5

Regardless of when this support was available, it is here now. Microsoft provides an internally built version of the open-source project. Leveraging open-source is a good move, but as with all Windows software, with a great customer base comes great responsibility.

Microsoft’s internally built libarchive isn’t exactly the same as the Windows binary released by libarchive, but it is close.

Ghidra PE Meta Data Showing A Similiar Number of Functions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
--- archiveint.dll.x64.10.0.19041.3636 Meta
+++ github-3.5.2.archive.dll Meta
@@ -1,44 +1,30 @@
-Program Name: archiveint.dll.x64.10.0.19041.3636
+Program Name: github-3.5.2.archive.dll
 Language ID: x86:LE:64:default (3.0)
 Compiler ID: windows
 Processor: x86
 Endian: Little
 Address Size: 64
 Minimum Address: 180000000
 Maximum Address: ff0000184f
-# of Bytes: 694896
-# of Memory Blocks: 10
-# of Instructions: 131686
-# of Defined Data: 5792
-# of Functions: 1778
-# of Symbols: 15300
-# of Data Types: 301
-# of Data Type Categories: 18
+# of Bytes: 904296
+# of Memory Blocks: 9
+# of Instructions: 168608
+# of Defined Data: 10021
+# of Functions: 1751
+# of Symbols: 14925
+# of Data Types: 288
+# of Data Type Categories: 19
 Analyzed: true
 Compiler: visualstudio:unknown
 Created With Ghidra Version: 11.0.1
 Executable Format: Portable Executable (PE)

Let’s figure out the changes for Microsoft’s version of libarchive related to this CVE. There are actually two components that make up libarchive in Windows. Both tar.exe (which aligns to the open-source bsdtar.exe) and archiveint.dll (which is libarchive’s archive.dll) . Only archiveint.dll was updated for this CVE so we are ignoring tar.exe.

Patch Diffing CVE-2024-20696

The first thing I like to do when patch diffing a CVE is to do a quick check to see what other CVEs might relate.

Here is the history of libarchive from MSRC’s list of CVEs:

cveTitleInitial ReleaseCVSSImpactKB Ver Type
CVE-2024-20696Windows Libarchive2024-01-097.3Remote Code Execution
KB5034121-10.0.22000.2713-Security Update
KB5034122-10.0.19044.3930-Security Update
KB5034122-10.0.19045.3930-Security Update
KB5034123-10.0.22621.3007-Security Update
KB5034123-10.0.22631.3007-Security Update
KB5034127-10.0.17763.5329-Security Update
KB5034129-10.0.20348.2227-Security Update
KB5034130-10.0.25398.643-Security Update
CVE-2024-20697Windows Libarchive2024-01-097.3Remote Code Execution
KB5034123-10.0.22621.3007-Security Update
KB5034123-10.0.22631.3007-Security Update
KB5034130-10.0.25398.643-Security Update
CVE-2021-36976Libarchive2022-01-110Remote Code Execution
KB5009543-10.0.19042.1466-Security Update
KB5009543-10.0.19043.1466-Security Update
KB5009543-10.0.19044.1466-Security Update
KB5009545-10.0.18363.2037-Security Update
KB5009555-10.0.20348.469-Security Update
KB5009557-10.0.17763.2452-Security Update
KB5009566-10.0.22000.434-Security Update

You can do a quick search in the MITRE CVE database to get an idea of common problems as well. Seems like the open-source libarchive had a use-after-free fixed in CVE-2021-36976 issue back in 2021. This CVE was Microsoft updating libarchive to a new version with the patch. Perhaps this is also the case for CVE-2024-20696?

FAQ providing more detail for CVE-2021-36976

Let’s see what CVE-2024-20696 is all about.

Requirements

For every patch diff, you will need two binaries to compare.

Specifically, you need:

  • vulnerable binary
  • patched binary
  • SRE tooling
  • symbol information (if available)

CVE North Stars

Identifying the Binaries

One approach to finding which binary matches a CVE is to use the CVE description to try and work it out. You can also rely on your past experience, a blog post, or some other insights, but for this CVE the description is enough.

MSRC’s Security Update Guide - Windows Libarchive RCE

Assuming you didn’t already know the binary was archiveint.dll, we can use the keyword “libarchive” from the description and find a binary with a similar description. If you search through PE binaries in System32, you will eventually find a matching description, or at least one that mentions libarchive.

archiveint.dll Properties match CVE Description

Another indicator that this binary is the one you are looking for is that it was updated for the January 2024 Patch Tuesday. The version 10.22621.3007 from the PE image matches the build for January 9, 2024—KB5034123 .

Downloading The Vulnerable and Patched Version

For Windows OS files there is no better resource than Winbindex when you want to download specific binaries. To make things easier, just use this wget command to get the files.

1
2
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/E9509ED1AD000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3636
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/C9506245ad000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3930

Patch Diffing Using Ghidriff

There are many tools that perform patch diffing. For the “Patch Diffing Tuesday” series, we will try multiple tools to see what we can learn from each. For the first post in the series, we will use ghidriff. ghidriff is a command-line patch diffing tool that leverages Ghidra’s powerful programming API and has some (self-proclaimed) amazing features for automating patch diffing. Full disclosure, I wrote this tool to enhance my vulnerability research and to understand patch diffing at a deeper level. If you want to learn more, check out the repo or read this detailed post explaining its origin and features.

For now, let’s see the output from comparing the two binaries. Run ghidriff on the command line after downloading the vulnerable and patched binaries.

1
ghidriff archiveint.dll.x64.10.0.19041.3636 archiveint.dll.x64.10.0.19041.3930

Ghidriff in a Box (Docker)

Alternatively, for those allergic to Ghidra and Java, you can just try the diff using the docker image.

1
2
3
4
mkdir -p ghidriffs
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/E9509ED1AD000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3636
wget https://msdl.microsoft.com/download/symbols/archiveint.dll/C9506245ad000/archiveint.dll -O ghidriffs/archiveint.dll.x64.10.0.19041.3930
docker run -it --rm -v $(pwd)/ghidriffs:/ghidriffs ghcr.io/clearbluejar/ghidriff ghidriffs/archiveint.dll.x64.10.0.19041.3636 ghidriffs/archiveint.dll.x64.10.0.19041.3930

Check out the console output here.

Whether you use the command line, or a docker container, when ghidriff is complete, you will have a “ghidriffs” directory that looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
% tree -L 2 ghidriffs
ghidriffs
├── archiveint.dll.x64.10.0.19041.3636
├── archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.md
├── archiveint.dll.x64.10.0.19041.3930
├── ghidra_projects
│   └── ghidriff-archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930
├── ghidriff.log
├── json
│   └── archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.json
└── symbols
    ├── 000admin
    ├── archiveint.pdb
    └── pingme.txt

ghidriff creates a new Ghidra project, imports and analyzes the binaries, and then finally outputs the diff and resulting json. The resulting diff can be viewed as a Github gist archiveint.dll.x64.10.0.19041.3636-archiveint.dll.x64.10.0.19041.3930.ghidriff.md

Reviewing Results

The markdown output looks like this:

archiveint.dll ghidriff markdown in vscode

When reviewing the results for a diff you walk through the added, deleted, and modified functions to see if something “pops out”.

Added / Deleted

Here, there are several added functions. Most of the functions start with wil_* which is a reference to the Windows Instrumentation Library . The several wil_* functions are functionality differences to the libarchive standard open-source version. For the patched version of Microsoft’s internal libarchive, they enable some telemetry from within libarchive (previously this functionality didn’t exist).

Added NtQueryWnfStateData:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
undefined8
wil_details_NtQueryWnfStateData
          (undefined8 param_1,undefined8 param_2,undefined8 param_3,undefined8 param_4,
          undefined8 param_5,undefined8 param_6)

{
  undefined8 uVar1;
  
  if (g_wil_details_pfnNtQueryWnfStateData == (FARPROC)0x0) {
    if (g_wil_details_ntdllModuleHandle == (HMODULE)0x0) {
      g_wil_details_ntdllModuleHandle = GetModuleHandleW(L"ntdll.dll");
    }
    g_wil_details_pfnNtQueryWnfStateData =
         GetProcAddress(g_wil_details_ntdllModuleHandle,"NtQueryWnfStateData");
    if (g_wil_details_pfnNtQueryWnfStateData == (FARPROC)0x0) {
      return 0xc0000139;
    }
  }
  uVar1 = (*g_wil_details_pfnNtQueryWnfStateData)(param_1,0,0,param_4,param_5,param_6);
  return uVar1;
}

The wil_details_NtQueryWnfStateData function added in the patched archiveint.dll dynamically sets a function pointer to NtQueryStateData at runtime, perhaps to avoid to a compile time link dependency to libarchive (just guessing). The patched binary configures the WNF via wil_details_StagingConfig_Load with the WNF states:

1
2
"WNF_WIL_MACHINE_FEATURE_STORE_MODIFIED": 0x418a073aa3bc8075,
"WNF_WIL_MACHINE_FEATURE_STORE": 0x418a073aa3bc7c75,

This aligns with our feature flag found below. For more information about WNF or Windows Notification Facility check out these awesome posts Introducing Windows Notification Facility’s (WNF) Code Integrity and Playing with the Windows Notification Facility (WNF)

The diff also reveals that new a feature flag was added. Features are often added to recently patched Windows code so that they can be enable and disabled dynamically. Often when Microsoft patches a function related to a CVE, the a feature toggle (or flag) comes with it. A feature flag in a function you suspect is related to a CVE might indicate you are looking in the right direction.

Modified

This diff has only one modified function. The best result for a patch diff. :)

ghidriff standard diff TOC

The function copy_from_lzss_window is interesting from a security perspective. This CVE is an RCE, which can often mean some type of memory corruption. Taking a look at the at the function we have memory allocations, memcpy, and size checks.

Pseudo code from copy_from_lzss_window:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
int copy_from_lzss_window(astruct *archive_read,void *buffer,int64_t startpos,int length)

{

// omitted code

// allocate some memory

  uVar4 = 0;
  _Size = (size_t)length;
  lVar1 = *archive_read->field2448_0x9a8;
  unp_buffer = *(LPCWSTR *)(lVar1 + 0xd0);
  if (unp_buffer == (LPCWSTR)0x0) {
    unp_buffer = (LPCWSTR)_o_malloc(*(undefined4 *)(lVar1 + 200));
    *(LPCWSTR *)(lVar1 + 0xd0) = unp_buffer;
    if (unp_buffer != (LPCWSTR)0x0) goto LAB_180036445;
    pcVar5 = "Unable to allocate memory for uncompressed data.";

// memcpy

   if (iVar2 < length) {
        memcpy(_Dst,_Src,(longlong)iVar2);
        _Src = *(void **)(lVar1 + 0x340);
        _Size = (size_t)(length - iVar2);
        _Dst = (void *)((ulonglong)(uint)(*(int *)(lVar1 + 0xc4) + iVar2) +
                       *(longlong *)(lVar1 + 0xd0));
      }
    }

// omitted code

  return iVar2;
}

The copy_from_lzss_window from the vulnerable version doesn’t line up with the open-source code tagged 3.6.2, so we can’t assume the code exactly aligns.

Root cause

When patch diffing a CVE, we get to see for ourselves what changed in the binary. It is awesome. We can’t always find the root cause just from the diff, but we can always learn more. For patch diffing a CVE, we can assume (or hope) the patch fixed a security issue. Sometimes this isn’t the case and the bug isn’t fixed. The primary difference in the function copy_from_lzss_window for the new version is the introduction of the feature flag and two size checks.

ghidriff side-by-side diff on diffpreview.github.io

Added Feature Flag and Checks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
--- copy_from_lzss_window
+++ copy_from_lzss_window
@@ -1,65 +1,66 @@

int copy_from_lzss_window(astruct *archive_read,void *buffer,int64_t startpos,int length)

// code omitted

 rar = (astruct_1 *)*archive_read->field2448_0x9a8;
+  uVar1 = Feature_3628230972__private_IsEnabled();
+  pWVar5 = (LPCWSTR)0x0;
+  if ((uVar1 == 0) || ((-1 < length && ((uint)length <= rar->unp_buffer_size)))) {
    pWVar2 = rar->field202_0xd0;
    if (pWVar2 == (LPCWSTR)0x0) {
      pWVar2 = (LPCWSTR)_o_malloc(rar->unp_buffer_size);
      rar->field202_0xd0 = pWVar2;
      if (pWVar2 == (LPCWSTR)0x0) {
        pcVar7 = "Unable to allocate memory for uncompressed data.";
        uVar4 = 0xc;
        goto LAB_18003732a;
      }
    }

// code omitted

Related function in Github

Speculating the Root Cause

If the Feature_3628230972 is toggled off (uVar1 = 0), the checks are ignored and the original unpatched code runs. Otherwise the, length field is checked to ensure that it is smaller than rar->unp_buffer_size used for the allocation later used in memcpy.

Added Size Check 1

1
(uint)length <= rar->unp_buffer_size

Size Check 2 ensures that the int length passed into lxss_copy_from_window is non-negative.

Added Size Check 2

1
(-1 < length)

When you cast a negative int to an unsigned int, that value becomes quite large. The issue is that the value of the length is used for the memcpy but is cast to an unsigned int.

One of the best thing to do with a function you don’t know much about is google it and see what turns up. Well, in this case, we immediately come back with a lead.

Old Github issue #521 invalid read in copy_from_lzss_window

An old CVE in the same function reported in 2015 caused an out-of-bounds read. This CVE was fixed in libarchive 3.2.1. This isn’t that same issue we are seeing with this patch, but it is useful. The Github repo even provides a malformed rar test case. A test case is useful because it can lead us down the path of dynamic analysis.

Dynamic Analysis

Something fun we can do in this case is try the broken rar file to get to the function. I wrote a quick demo program that will allow you to load the vulnerable archiveint.dll based on this test case. Well, I had a bit of help getting started. Only two changes were needed to make the Copilot output compile.

Copilot providing a quick test harness base

I then added the functionality from the broken test case to exercise the vulnerable code path.

If you want to play around with this go ahead and clone the repo at https://github.com/clearbluejar/CVE-2024-20696. Set a breakpoint on archiveint.dll!copy_from_lzss_window. Set r9 to something really large. See what happens.

Attempting the broken RAR file

Using the test case, we can hit the vulnerable function archiveint.dll!copy_lzss_window.

WinDbg breaking on the vulnerable function

Now using my libarchive-harness-win.exe we can hit the same path.

Hitting copy_from_lzss_window in Visual Studio

The 2015 CVE test case doesn’t trigger the issue patched in our CVE, but a quick tweak in the debugger can demonstrate what would happen if you could control the length parameter. When you set length to a negative int $r9 = 0xF00000000 it will be sign extended here when cast to a uint64 to 0xFFFFFFFF00000000 and become a very large integer. That integer is used in the memcpy

Failing Size Check 2

1
2
3
4
5
  // length is cast to unsigned int
  _Size = (size_t)length;
  
  // What could go wrong?
  memcpy(_Dst,_Src,0xFFFFFFFF00000000);

The resulting call stack was actually an exception on the read, not quite RCE, but close? Besides, this CVE “exploitability” was rated “less likely”.

Access violation from memcpy

In the patched version, setting r9 to a large unsigned int is handled properly and the parsing of the rar test case just results in an error. I really want to build a rar to demonstrate the issue without modifying registers in the debugger, but the complexity of a creating a working rar file is prohibitive. Perhaps we can fuzz it?

WinAfl hasn’t come up with it yet. Maybe check back in a few days? :)

Kicking off WinAFL with libarchive-win-harness.exe

Wrapping Up

Well, that concludes our patch diff of CVE-2024-20696 and our adventure with libarchive. Microsoft’s internal build of libarchive (archiveint.dll) is now protected against a potential memory corruption from a suspect RAR file with an invalid length parameter.

The diff was interesting because we could compare it against the open source version, had previous CVE examples, and had only one modified function. Also, we determined several subtle differences in Microsoft’s libarchive implementation of the open-source repo.

CVE-2024-20697 is another CVE in a different version of libarchive (3.6.2/Win11 rather than 3.5.2/Windows 10). Try it. See what you come up with. I posted a result here if you want to compare.

One last thought, isn’t it interesting that Microsoft isn’t using the latest version of libarchive. What about the security issues found in later versions? 🤔

This concludes the first post in the Patch Tuesday Diffing Series. This time we tried ghidriff for diffing and we will try more as we go. If you like it, don’t forget to send ghidriff a ⭐️ on its Github repo.


Going Deeper

I hope this post inspires you to leverage the power of patch diffing to see a bit deeper into next month’s Patch Tuesday. If you want some free tips on patch diffing using native Ghidra check out my free online tutorial.

CVE North Stars - Ghidra Patch Diffing Tutorial

*A CVE provides a compass of sorts that orients and guides a researcher towards a deeper understanding of the patched vulnerability and its vulnerability class. The idea is to treat CVEs as North Stars in vulnerability discovery and comprehension.**

https://cve-north-stars.github.io/

Patch Diffing In The Dark - Blackhat Training USA 2024

Patch Diffing In The Dark Blackhat USA 2024

If you prefer to get more hands on, consider my upcoming Blackhat training “Patch Diffing in the Dark: Binary Diffing for Vulnerability Researchers and Reverse Engineers”. In this in-person course, I will teach you how to leverage patch diffing to kick start your vulnerability research on both the Windows and Android platform. Together we will learn how to reverse engineer CVEs. We will hunt, find, and root cause several modern vulnerabilities.

The 2-day course is available on Saturday/Sunday or Monday/Tuesday. Early bird pricing is still available, but seats are limited – secure your spot today!

Saturday/Sunday Class Registration AUGUST 3-4

Monday/Tuesday Class Registration AUGUST 5-6

This post is licensed under CC BY 4.0 by the author.