Patching a Java JAR

4 minute read

I’m working with a company that uses smart IoT devices produced far away. The main troubleshooting tool is a Java utility provided by the manufacturer. This utility is provided as a ZIP file containing a JAR with the main logic, and some additional libraries. Executing this JAR works well on Windows/MacOS, but it seems the manufacturer didn’t account for Linux, where the utilty crashes. I was asked to help make this utility run on the Linux-using engineers’ laptops.

Since I had a good experience with reading already-compiled JARs with IntelliJ IDEA, I figured I can do the following:

  1. Extract the JAR to a directory (as it’s just a ZIP file with a specific file structure inside)
  2. Decompile some class files into their java equivalents
  3. Modify said java files (in my case, to support Linux)
  4. Recompile the java files, placing the new resulting class instead of the original ones
  5. Repackage the JAR

I did this manually and things went great. As an aside, the actual change looked something like this:

  if (Util.isWindows()) {
    ...
  } else if (Util.isMac()) {
    ...
+ } else {
+   ...
  }

Now that I spent 15 minutes doing it manually, I obviously had to spend a couple of hours automating it. The resulting script does pretty much the above, except automatically. It takes:

  1. Source JAR
  2. Destination JAR filename
  3. Patch file to apply
  4. Where IntelliJ IDEA lives (for the disassembly part)
  5. Optional class/JAR files needed for compilation

The algorithm works as follows:

  1. Unpack source JAR to a temp directory (directory A)
  2. Decompile the entire directory using the IDEA component into another temp directory (directory B)
  3. Create a git repository from directory B
  4. Use git apply with the provided patch (using git because the original patch only works on a single file)
  5. Use the plumbing equivalent of git status to see which files were affected by the patch (easier than parsing the patch file)
  6. Recompile those specific files back into class files
  7. Copy those class files back into directory A
  8. Repackage directory A as a JAR into the destination filename

Interesting facts

  1. Using git seems a bit werid, but not only does it make applying patches easy, it also allows easily generating them (using git diff on the directory)
  2. The decompilation process is not perfect, and sometimes produces code that won’t compile without some modifications.
    While some of these may be bugs, a little bit of blame goes to type erasure of generics, where all List<x> only survive as List, and you have to convice the compiler that what’s inside isn’t Object but rather a concrete type
  3. I was surprised to find that lambdas survive decompilation. I thought they’ll get converted to some autogenerated class.
  4. At first I was experimenting with Java commands to extract/build a JAR, thinking that there’s a secret to how these are created.
    There isn’t. Just re-zipping the same directory structure works.
  5. There might be significance to which Java version you’re using to recompile, depending on who/what is using the rebuilt JAR.
  6. Python’s check_call and check_output are so convenient.
  7. While Python’s zipfile module is OK for extracting an entire directory, it was very inconvenient for doing the opposite. I ended up using the zip CLI tool.

The script

#!/usr/bin/env python3

import re
import argparse
import zipfile
import tempfile
import subprocess
import os
import os.path
import glob


def zipdir(path, zip_file):
    subprocess.check_call(["zip", "-qr", zip_file, "."], cwd=path)


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("source", help="Source JAR to read")
    parser.add_argument(
        "-cp", "--classpath", action="append", help="classes needed for compilation"
    )
    parser.add_argument(
        "-jcp",
        "--jar-classpath",
        action="append",
        help="directories with jars needed for compilation",
    )
    parser.add_argument(
        "-p",
        "--patch",
        help="patch file to apply. If not specified, compilation will be skipped. Avoid specifying to just look at the decompiled files",
    )
    parser.add_argument(
        "-id",
        "--idea-path",
        default="/snap/intellij-idea-community/current",
        help="path to IntelliJ IDEA. Needed for the decompilation logic",
    )
    parser.add_argument("dest", help="Filename for re-compiled JAR")
    return parser.parse_args()


def compose_classpath(classpath, jar_classpath):
    ret = []
    if classpath:
        ret.extend(classpath)
    if jar_classpath:
        ret.extend([os.path.join(d, "*") for d in jar_classpath])
    return ":".join(ret)


def tmpdir():
    with tempfile.TemporaryDirectory() as d:
        name = d
    return name


def main():
    args = parse_args()
    classpath = compose_classpath(args.classpath, args.jar_classpath)
    # unzip to tmpdir
    a_dir = tmpdir()
    print(f"extracted directory: {a_dir}")
    with zipfile.ZipFile(args.source, "r") as z:
        z.extractall(a_dir)

    # decompile
    b_dir = tmpdir()
    os.mkdir(b_dir)
    print(f"decompiled directory: {b_dir}")
    subprocess.check_call(
        [
            f"{args.idea_path}/jbr/bin/java",
            "-jar",
            f"{args.idea_path}/plugins/java-decompiler/lib/java-decompiler.jar",
            "-dhs=true",
            a_dir,
            b_dir,
        ],
        stdout=subprocess.DEVNULL,
    )

    # Create repo
    subprocess.check_call(["git", "-C", b_dir, "init", ".", "-b", "master"])
    subprocess.check_call(["git", "-C", b_dir, "add", ":/"])
    subprocess.check_call(
        ["git", "-C", b_dir, "commit", "-m", "initial"],
        stdout=subprocess.DEVNULL,
    )

    patch_file = args.patch
    if not patch_file:
        print("no patch file, just extracting")
        return

    # apply patch
    subprocess.check_call(["git", "-C", b_dir, "apply", os.path.realpath(patch_file)])

    touched_files = subprocess.check_output(
        ["git", "ls-files", "-m"], text=True, cwd=b_dir
    ).split("\n")
    touched_files = [t for t in touched_files if t]

    # compile
    subprocess.check_call(
        ["java", "-cp", f"{a_dir}:{classpath}", *touched_files], cwd=b_dir
    )

    # Copy back to A dir
    for java_file in touched_files:
        base_name = re.sub(r"\.java$", "", java_file)
        files = glob.glob(f"{base_name}*.class", root_dir=b_dir)
        for file in files:
            os.rename(f"{b_dir}/{file}", f"{a_dir}/{file}")

    # rezip a dir
    zipdir(a_dir, args.dest)


main()