Patching a Java JAR
I’m working with a company that uses smart IoT devices produced far away. The main troubleshooting tool is a Java utility provided by the manufacturer. This utility is provided as a ZIP file containing a JAR with the main logic, and some additional libraries. Executing this JAR works well on Windows/MacOS, but it seems the manufacturer didn’t account for Linux, where the utilty crashes. I was asked to help make this utility run on the Linux-using engineers’ laptops.
Since I had a good experience with reading already-compiled JARs with IntelliJ IDEA, I figured I can do the following:
- Extract the JAR to a directory (as it’s just a ZIP file with a specific file structure inside)
- Decompile some
class
files into theirjava
equivalents - Modify said
java
files (in my case, to support Linux) - Recompile the java files, placing the new resulting
class
instead of the original ones - Repackage the JAR
I did this manually and things went great. As an aside, the actual change looked something like this:
if (Util.isWindows()) {
...
} else if (Util.isMac()) {
...
+ } else {
+ ...
}
Now that I spent 15 minutes doing it manually, I obviously had to spend a couple of hours automating it. The resulting script does pretty much the above, except automatically. It takes:
- Source JAR
- Destination JAR filename
- Patch file to apply
- Where IntelliJ IDEA lives (for the disassembly part)
- Optional class/JAR files needed for compilation
The algorithm works as follows:
- Unpack source JAR to a temp directory (directory A)
- Decompile the entire directory using the IDEA component into another temp directory (directory B)
- Create a git repository from directory B
- Use
git apply
with the provided patch (using git because the originalpatch
only works on a single file) - Use the plumbing equivalent of
git status
to see which files were affected by the patch (easier than parsing the patch file) - Recompile those specific files back into
class
files - Copy those class files back into directory A
- Repackage directory A as a JAR into the destination filename
Interesting facts
- Using git seems a bit werid, but not only does it make applying patches easy, it also allows easily generating them (using
git diff
on the directory) - The decompilation process is not perfect, and sometimes produces code that won’t compile without some modifications.
While some of these may be bugs, a little bit of blame goes to type erasure of generics, where allList<x>
only survive asList
, and you have to convice the compiler that what’s inside isn’tObject
but rather a concrete type - I was surprised to find that lambdas survive decompilation. I thought they’ll get converted to some autogenerated class.
- At first I was experimenting with Java commands to extract/build a JAR, thinking that there’s a secret to how these are created.
There isn’t. Just re-zipping the same directory structure works. - There might be significance to which Java version you’re using to recompile, depending on who/what is using the rebuilt JAR.
- Python’s
check_call
andcheck_output
are so convenient. - While Python’s
zipfile
module is OK for extracting an entire directory, it was very inconvenient for doing the opposite. I ended up using thezip
CLI tool.
The script
#!/usr/bin/env python3
import re
import argparse
import zipfile
import tempfile
import subprocess
import os
import os.path
import glob
def zipdir(path, zip_file):
subprocess.check_call(["zip", "-qr", zip_file, "."], cwd=path)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("source", help="Source JAR to read")
parser.add_argument(
"-cp", "--classpath", action="append", help="classes needed for compilation"
)
parser.add_argument(
"-jcp",
"--jar-classpath",
action="append",
help="directories with jars needed for compilation",
)
parser.add_argument(
"-p",
"--patch",
help="patch file to apply. If not specified, compilation will be skipped. Avoid specifying to just look at the decompiled files",
)
parser.add_argument(
"-id",
"--idea-path",
default="/snap/intellij-idea-community/current",
help="path to IntelliJ IDEA. Needed for the decompilation logic",
)
parser.add_argument("dest", help="Filename for re-compiled JAR")
return parser.parse_args()
def compose_classpath(classpath, jar_classpath):
ret = []
if classpath:
ret.extend(classpath)
if jar_classpath:
ret.extend([os.path.join(d, "*") for d in jar_classpath])
return ":".join(ret)
def tmpdir():
with tempfile.TemporaryDirectory() as d:
name = d
return name
def main():
args = parse_args()
classpath = compose_classpath(args.classpath, args.jar_classpath)
# unzip to tmpdir
a_dir = tmpdir()
print(f"extracted directory: {a_dir}")
with zipfile.ZipFile(args.source, "r") as z:
z.extractall(a_dir)
# decompile
b_dir = tmpdir()
os.mkdir(b_dir)
print(f"decompiled directory: {b_dir}")
subprocess.check_call(
[
f"{args.idea_path}/jbr/bin/java",
"-jar",
f"{args.idea_path}/plugins/java-decompiler/lib/java-decompiler.jar",
"-dhs=true",
a_dir,
b_dir,
],
stdout=subprocess.DEVNULL,
)
# Create repo
subprocess.check_call(["git", "-C", b_dir, "init", ".", "-b", "master"])
subprocess.check_call(["git", "-C", b_dir, "add", ":/"])
subprocess.check_call(
["git", "-C", b_dir, "commit", "-m", "initial"],
stdout=subprocess.DEVNULL,
)
patch_file = args.patch
if not patch_file:
print("no patch file, just extracting")
return
# apply patch
subprocess.check_call(["git", "-C", b_dir, "apply", os.path.realpath(patch_file)])
touched_files = subprocess.check_output(
["git", "ls-files", "-m"], text=True, cwd=b_dir
).split("\n")
touched_files = [t for t in touched_files if t]
# compile
subprocess.check_call(
["java", "-cp", f"{a_dir}:{classpath}", *touched_files], cwd=b_dir
)
# Copy back to A dir
for java_file in touched_files:
base_name = re.sub(r"\.java$", "", java_file)
files = glob.glob(f"{base_name}*.class", root_dir=b_dir)
for file in files:
os.rename(f"{b_dir}/{file}", f"{a_dir}/{file}")
# rezip a dir
zipdir(a_dir, args.dest)
main()