Post

Desuperpacking Meta Superpacked APKs

Desuperpacking Meta Superpacked APKs

TL;DR Superpacking is a method of optimal binary compression developed by Meta to help reduce the size of their Android APKs. This compression for APKs makes sense for reducing network traffic required for distribution, but becomes an issue when trying to recover the original native ARM binaries for analysis. This post walks through the process of “desuperpacking” (decompressing) Meta Superpacked APKs. You will get an overview of Meta’s superpack compression, a quick look at Superpack internals, and eventually learn how to automate desuperpacking native Android ARM libraries using GitHub Actions.

Discovering Superpacked APKs

So the Superpack compression technique, introduced in 2021, isn’t so new , it was just new to me.

Meta’s Superpack Blog Post

Last month I needed to do some patch diffing analysis of some recent WhatsApp CVEs. I thought it would be as simple as downloading a copy of the APK and extracting the native libraries, and loading them into Ghidra

Expecting libwhatsapp.so

Expecting a simple listing of all the shared objects in the APK like this:

WhatsApp Android APK for CVE-2019-3568

Finding libsuperpack.so

What I found instead searched for libwhatsapp.so in the native lib folder of the APK was libsuperpack.so.

WhatsApp Android APK for CVE-2022-36934

Not knowing what this shared object was, I naively loaded up libsuperpack.so into Ghidra thinking they just statically compiled all the libs into a single shared object. This was not the case.

If you look at the exports from libwhatsapp.so for an older version (like CVE-2019-3568 ) you expect to see several Java exports like Java_com_whatsapp_*:

WhatsApp libwhatsapp.so exports

In libsuperpack.so, all I saw was several references to several functions referencing decompression….

WhatsApp libsuperpack.so exports

What was I looking at? What happened to libwhatsapp.so? Time to learn more about Meta’s Superpacking.

Superpacked APKs

There isn’t much out there about the technique beyond their blog post. There is a subtle reference to Superpacking in Facebook’s SoLoader github repo, but it is a bit cryptic. Luckily, their well written Superpack blog post gives us insight into the process.

Superpacked Android APKs

Purpose

From the article:

Superpack combines compiler and data compression techniques to increase the density of packed data in a way that is especially applicable to code such as Dex bytecode and ARM machine code.

Superpacking is Meta’s compression technique for reducing the size of their APKs making them easier to distribute as less bandwidth is required for distribution. This goal makes sense as they have distributed most of their apps billions of times.

Compression++

Superpack improves the process of LZ parsing by enabling the discovery of longer repeating sequences while also reducing the number of bits to represent pointers.

My high level understanding of compression is an algorithm that reduces the amount of data needed to represent itself. A compression algorithm recognizes repeated data patterns, replaces them with a shorter symbol, and uses the shorter reference each time it is found in the original data. This process allows a smaller (compressed) representation of the original content. Apparently, Superpack goes beyond standard LZ compression and takes advantage of characteristics of compiled code and repeated byte patterns found within modern binaries. I’m not going to pretend to understand this well enough to explain it, check out this part of the blog for more detail.

Uses

There are three main payloads targeted by Superpack. The first is Dex bytecode, the format into which Java gets compiled in Android apps. The second is ARM machine code, which is code compiled for ARM processors. The third is Hermes bytecode, which is a specialized high performance bytecode representation of Javascript created at Facebook.

The compression is optimized for Dex bytecode (from source Java .class -> .dex), ARM machine code ( compiled native ARM binaries ) , and Hermes bytecode. In our particular case for WhatsApp, the main native ARM library libwhatsapp.so is now somehow contained within libsuperpack.so. This implies WhatsApp at least implements compression for its ARM machine code. Later in the post, we take a quick look at Messenger, which seems to optimize the Dex bytecode instead.

Internals

The article reveals the reason behind the compression and provides clues on how it might be implemented.

Superpack’s strength lies in compressing code, such as machine code and bytecode, as well as other types of structured data. The approach underlying Superpack is based on an insight in Kolmogorov’s algorithmic measure of complexity, which defines the information content of a piece of data as the length of the shortest program that can generate that data.

Hmm. The shortest program that can generate that data.

With what we have already learned, and the fact the that expected native libraries have been replaced by libsuperpack.so, this new lib must be “a program that can generate that data”. In our case, extract the missing shared object files.

WhatsApp APK via jadx

We can follow some basic JNI reverse engineering techniques and use jadx and Ghidra to see how these native libraries work under the hood. Using jadx-gui, we can load the APK and search for the string “superpack”…

Jadx “superpack” Strings

From the string search we can immediately see a function that loads libsuperpack.so in WhatsAppLibLoader:

WhatsApp loading libsuperpack.so

The loadSuperpack method had a random name, so I renamed it for clarity. Taking a look at who calls loadSuperpack and finding decompressLibraries which calls decompressAsset. Seems like we are on the right track.

jadx decompressAsset and decompressLibraries

The decompressAsset method calls another function I renamed callNativeSuperPackDecompress that receives the parameter COMPRESSED_LIBS_ARCHIVE_NAME. The name is defined in the class AbstractAppShellDelegate as the string:

1
public static final String COMPRESSED_LIBS_ARCHIVE_NAME = "libs.spk.zst";

Eventually, we will discover that this compressed archive is the one holding our missing shared files.

And now for the callNativeSuperpackDecompress function:

jadx decompiled callNativeSuperpackDecompress

What’s important about this function ends up calling the native method decompress on files.

JNI Native Method Analysis

When loading a native library from Java with System.LoadLibarary, the method JNI_OnLoad (an export of the native lib) is called. A JNI_OnLoad function returns the JNI version compiled the library and typically registers any native methods for use in Java. Most JNI_OnLoad methods would have code similar to this sample native.cpp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// from platform/development/+/master/samples/SimpleJNI/jni/native.cpp

// ----------------------------------------------------------------------------
/*
 * This is called by the VM when the shared library is first loaded.
 */
 
jint JNI_OnLoad(JavaVM* vm, void* /*reserved*/)
{
    UnionJNIEnvToVoid uenv;
    uenv.venv = NULL;
    jint result = -1;
    JNIEnv* env = NULL;
    
    ALOGI("JNI_OnLoad");
    if (vm->GetEnv(&uenv.venv, JNI_VERSION_1_4) != JNI_OK) {
        ALOGE("ERROR: GetEnv failed");
        goto bail;
    }
    env = uenv.env;
    if (registerNatives(env) != JNI_TRUE) {
        ALOGE("ERROR: registerNatives failed");
        goto bail;
    }
    
    result = JNI_VERSION_1_4;
    
bail:
    return result;
}

For libsuperpack.so we can see similar code in JNI_OnLoad_Weak:

libsuperpack.so JNI_OnLoad_Weak

It’s JNI_OnLoad seems to register several native functions in it’s called functions with the one we are interested found within init_asset_decompressor.

libsuperpack.so init_asset_decompressor

The init_asset_decompressor function above contains the methods FindClass and RegisterNatives which follows a typical registerNativeMethods example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// from platform/development/+/master/samples/SimpleJNI/jni/native.cpp

/*
 * Register several native methods for one class.
 */
static int registerNativeMethods(JNIEnv* env, const char* className,
    JNINativeMethod* gMethods, int numMethods)
{
    jclass clazz;
    clazz = env->FindClass(className);
    if (clazz == NULL) {
        ALOGE("Native registration unable to find class '%s'", className);
        return JNI_FALSE;
    }
    if (env->RegisterNatives(clazz, gMethods, numMethods) < 0) {
        ALOGE("RegisterNatives failed for '%s'", className);
        return JNI_FALSE;
    }
    return JNI_TRUE;
}

The jadx provided class AssetDecompressor from the package com.facebook.superpack clearly defines the natives methods. init_asset_decompressor above registers the decompress functionality responsible for decompressing Superpacked “libs.spk.zst”.

jadx view of AssetDecompressor with several native method declarations

WhatsApp Superpack decompress method

You can actually see the decompress function in Ghidra…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// renamed from FUN_000343c4

jobjectArray
decompress(JNIEnv *param_1,jobject param_2,
	undefined4 param_3,undefined *param_4,jstring param_5, jstring param_6)

{
  int iVar1;
  char *pcVar2;
  int iVar3;
  FILE *__stream;
  char *chars;
  char *pcVar4;
  int **ppiVar5;
  jclass clazz;
  jstring val;
  int *piVar9;
  size_t sVar10;
  size_t sVar11;
  _func_405 *__s;
  char *pcVar12;
  _func_405 *p_Var13;
  jobjectArray array;
  jsize len;
  int local_28;
  
  local_30 = (undefined4 *)0x0;
  p_Var13 = (_func_405 *)param_4;
  iVar1 = AAssetManager_fromJava(param_1,param_3);
  if (iVar1 == 0) {
    pcVar2 = "could not get asset manager";
LAB_0003448c:
    FUN_0004c138((int *)param_1,"com/facebook/superpack/AssetDecompressionException",pcVar2,p_Var13)
    ;
    return (jobjectArray)0;
  }
  p_Var13 = (*param_1)->GetStringUTFChars;
  pcVar2 = (*p_Var13)(param_1,(jstring)param_4,(jboolean *)0x0);
  if (pcVar2 == (char *)0x0) {
    pcVar2 = "could not extract asset path";
    goto LAB_0003448c;
  }
  iVar1 = AAssetManager_open(iVar1,pcVar2,1);
  if (iVar1 == 0) {
    FUN_0004c138((int *)param_1,"com/facebook/superpack/AssetDecompressionException",
                 "could not access asset",p_Var13);
    pcVar4 = (char *)0x0;
    array = (jobjectArray)0x0;
    __stream = (FILE *)0x0;
    goto LAB_0003466a;
  }

 /* 
    Several lines omitted
    .
    .
    .
 */

FUN_0004c138((int *)param_1,"com/facebook/superpack/AssetDecompressionException",pcVar4,p_Var13);
  AAsset_close(iVar1);
  pcVar4 = (char *)0x0;
  array = (jobjectArray)0x0;
LAB_0003466a:
  (*(*param_1)->ReleaseStringUTFChars)(param_1,param_5,pcVar2);
  if (pcVar4 != (char *)0x0) {
    (*(*param_1)->ReleaseStringUTFChars)(param_1,param_5,pcVar4);
  }
  if (__stream != (FILE *)0x0) {
    fclose(__stream);
  }
  return array;
}

From the jadx decompiled Java code in the function we named callNativeSuperpackDecompress , we can actually see the native decompress function called with it’s parameters.

1
decompress = AssetDecompressor.decompress(context.getAssets(), obj, substring, file.getAbsolutePath());

OK. So this is how the decompression process works in WhatsApp. If we continue down this path we might be able to create a new app and load libsuperpack.so, find a reference to the compressed data in the AssetManager, and call decompress for the files we need. Before we head all the way down that path we need to ask ourselves if we are trying to hard if our goal is to simply extract files. Also, after reviewing the extraction code, might the Superpack decompression implementation in Instagram, Messenger, or Facebook APKs vary slightly?

Quick Look at Facebook Messenger’s Superpack

Yes… yes it would.

Taking a quick look at Messenger 414.0.0.17.61 , it has a different Superpack shared library libsuperpack-jni.so. It has a similar RegisterNative equivalent function called init_superpack_archive which uses a different class SuperpachArchive within com.facebook.superpack.

Messenger’s libsuperpack-jni.so RegiterNatives functioninit_superpack_archive

The SuperpachArchive class seems a bit more complex, with no direct equivalent decompress function immediately apparent.

jadx view of SuperpackFile with several native method declarations

As some of the other shared object libraries are available sitting besides libsuperpack-jni.so in its lib directory, maybe Messenger doesn’t superpack its native libraries, but rather superpacks its Dex bytecode?

Desuperpacking Android Native ARM Libraries

Android Superpack APKs

Great, so there is this Superpack compression algorithm and we know a bit how it is initialized and have identified the native library responsible. We think we have even found the function that decompresses Superpacked files. We might be able to create a utility app to extract WhatsApp Superpacked files, but what about a general desuperpack utility?

Presently, Superpack is available only to our engineers, but we aspire to bring the benefits of Superpack to everyone. To this end, we are exploring ways to improve the compatibility of our compression work with the Android ecosystem.

As I haven’t seen any public tools to this end, and with the only reference to Superpack I found on Github was in Facebook’s Soloader library, it doesn’t seem quite worth it to build a custome extractor per app to simply get the native libaries I need.

A Manual Approach

Taking a step back, knowing that Superpack is a “program that generates a program”, and that Android can’t load compressed libraries, we simply need to run the app. In order for libwhatsapp.so to be loaded properly it must be decompressed first. So, if we install the APK in an Android emulator, the files should exist somewhere.

Install APK manual in Android Emulator

This is exactly the case. After you install the APK the files we have long been searching for are available.

Long lost native ARM shared libraries found

Well this post could have been a bit shorter. Just install your Superpacked APK on an emulator and extract the files.

The files for WhatsApp are located at /data/data/com.whatsapp/files/decompressed/libs.spk.zst/. This manual approach works well when you only have a few files to recover, but it quickly becomes tedious if you have more than a few files to extract with several architectures to consider. Seems like we have a repetitive task that we might want to repeat over and over. Time to do what every developer does in this situation. Time to write a script!

Automating with Github Actions

To begin automation let’s review what the Android emulator is doing for us. When we drop an APK file into the emulator it installs the APK. Immediately after installing I found new files in /data/data/com.whatsapp but the decompressed shared objects weren’t there yet. They appeared only after I ran the app (aka running the app’s Main activity ). After that I just needed to copy out the files.

Workflow

The high level tasks then are:

  • Install APK
  • Launch the app (via it’s Main activity)
  • Extract the files

Installing an APK can be done via scripting by connecting to an Android device:

1
2
3
4
5
6
7
8
adb connect localhost:9999
* daemon not running; starting now at tcp:5037
* daemon started successfully
connected to localhost:9999

adb devices -l 
List of devices attached
localhost:9999         device product:redroid_x86_64 model:redroid12_x86_64 

Then running the install command:

1
adb -s localhost:9999 install --abi arm64-v8a "whatsapp.apk"

To launch an app’s Main activity, you first need to figure out what it is. You can dump the “badging” information using aadpt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
aapt dump badging "whatsapp.apk"       
package: name='com.whatsapp' versionCode='221670000' versionName='2.22.16.70' compileSdkVersion='31' compileSdkVersionCodename='12'
sdkVersion:'16'
targetSdkVersion:'31'
uses-permission: name='android.permission.READ_PHONE_STATE'
uses-permission: name='android.permission.READ_PHONE_NUMBERS'
uses-permission: name='android.permission.RECEIVE_SMS'
uses-permission: name='android.permission.VIBRATE'
uses-permission: name='android.permission.WRITE_EXTERNAL_STORAGE'
uses-permission: name='android.permission.WRITE_SYNC_SETTINGS'

# several lines ommitted

launchable-activity: name='com.whatsapp.Main'  label='WhatsApp' icon=''
densities: '120' '160' '240' '320' '480' '640' '65534' '65535'
native-code: 'arm64-v8a' 'armeabi-v7a' 'x86' 'x86_64'  

From the APK metadata found within, a start “Main Activity” command can be created.

1
2
adb -s localhost:5555 shell am start -n com.whatsapp/com.whatsapp.Main
Starting: Intent { cmp=com.whatsapp/.Main }

From there we just need a method to extract the files.

Running adb command assumes you are able to connect to a device or emulator. This can be solved by using a docker container called reDroid designed to run an emulator within continuous integration. This is perfect, nothing to setup or install.

1
docker run -itd --rm --privileged --pull always -v $(pwd)/data:/data -p 9999:5555 redroid/redroid:12.0.0-latest

Using docker I also solve the problem of extracting the files as we can map the /data/data/<packagename> path to our host while the emulator is running.

We can now script each step in our workflow. It is time to automate this in GitHub Actions. For this I create a new repo called apk-install-extract.

GitHub Actions apk-install-extract

apk-install-extract GitHub Repo

Within the repo are some simple scripts that automate extraction of APK metadata that essentially we can run the above commands on a GitHub runner. The workflow is easiest to see in the yaml file itself.

The first step is to download and unzip a pile of APKs. I setup the workflow to essentially take a URL parameter as input. The URL should point to a zip file full of APKs that you want to extract. Run the action “Install Extract APK”.

Running Install Extract APK

Once the workflow downloads the zip file, it then uses a martrix strategy to spin up a runner for each architecture (as an APK may contain 1 or more) and each apkpath:

1
2
3
4
5
6
7
8
download-install-and-extract:
    needs: generate-matrix
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:        
        apkpath: ${fromJson(needs.generate-matrix.outputs.apkpaths)}
        arch: ['arm64-v8a', 'armeabi-v7a', 'x86', 'x86_64']

This causes 4 runners to run the entire script, one for each arch and APK provided.

GitHub Action APK Artifacts

For each architecture we run steps from the workflow above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- name: Run emulator
run: |
  mkdir data
  docker run -itd --rm --privileged --pull always -v $(pwd)/data:/data -p ${env.avd_port}:5555 redroid/redroid:12.0.0-latest          

- name: Download runner artifacts 
uses: actions/download-artifact@master
with:
  name: ${env.all_apk_dir}
  path: ${env.all_apk_dir}


- name: adb connect install
run: |                    
  adb connect ${env.avd_name}
  adb devices -l 
  adb -s ${env.avd_name} install --abi ${matrix.arch} "${matrix.apkpath}"
  
- name: adb start main activity
run: |                    
  # adb -s localhost:5555 shell am start -n com.whatsapp/com.whatsapp.Main          
  ./adb-run.sh "${matrix.apkpath}"

Which results in this:

GitHub Actions Workflow Output

After this part, the extracted files are bundled and uploaded as artifacts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
- name: aapt dump badging
run: | 
  aapt dump badging "${matrix.apkpath}" >  ${ env.apkpkgver }/${ env.apkpkgver }.badging.txt         
  cat ${ env.apkpkgver }/${ env.apkpkgver }.badging.txt

- name: fix data permissions
if: always()
run: | 
  # stop docker containers
  docker stop $(docker ps -a -q)
  sudo chown -R runner:runner data
- name: compress archive data
run: |
  mv data data-${ env.apkpkgver }-${matrix.arch} 
  tar cvzf ${ env.apkpkgver }/${ env.apkpkgver }.${matrix.arch}.data.tar.gz data-${ env.apkpkgver }-${matrix.arch}/data/${ env.apkpkg }

- name: Upload data
uses: actions/upload-artifact@v3
if: always()
with:
  name: all_package_data
  path: ${ env.apkpkgver }
  retention-days: 25

GitHub Actions APK Artifacts

Resulting artifacts:

  • all_apks - contains a copy of the original APKs provided
  • all_data - contains the entire /data/data directory from the emulator
  • all_package_data - contains the specific package extracted files and metadata

Boom. We have automated extracting files from Superpacked (and ordinary) APKs.

all_package_data files extracted

Automation Complete

Nice work making it this far. Hope you learned a bit about Android native libraries, Superpack compression, and Github Actions. This is my first post focusing on Andoird, if I have missed some key components leave some feedback or send a DM.

If you would like to “desuperpack” some of your own APKs go ahead and fork apk-install-extract and feed it your own bundle of zipped APKs. The workflow has room for improvement, and feel free to share back a PR with any improvements.

Cover Photo from pexels

This post is licensed under CC BY 4.0 by the author.