One of my current projects is migrating a big Java project from Java 8 to a supported version. Since we’re doing small, stable steps, we started with Java 11. After getting the server to start on Java 11 with the usual mix of upgrading libraries and tweaking config files, I let it run for a while on our testing environment to see what interesting errors pop up.

One error that caught my eye was “Invalid number: Nov”. Many times when I see error messages, I have an internal voice doubting the computer’s version of events.

  • “How could this object be null?”
  • “In what world does this array have less than 10 items?”
  • “Is there really an email address that has non-ASCII characters?”

However, at this moment, the machine and I had no argument. I wholeheartedly agreed that “Nov” is an invalid number, and that it was wrong of me to ask the code to treat it as a number. Now to find my shame and correct it.

A bit of stacktrace-following later, I reached some code that tries to deserialize a string containing JSON from the database into a Java object using the GSON library. A bit smelly maybe, but not a crime. The specific failure was on a Date field that was stored as a string. It looked a bit like Nov 16, 2023, 10:32:43 AM. Weird format, sure, but still parsable. The “Nov” part makes perfect sense there, and more imporantly - this string was created by, and can be perfectly read by, the same code under Java 8. What caused the breakage?

Using the “Java disassembler” integral to IntelliJ IDEA, I did some reading in the GSON library. Inside, I found logic that roughly translates to the following:


Date deserializeDate(String string) {
    try {
        return this.defaultFormat.parse(string);
    } catch (ParseException e){
        return ISO8601Utils.parse(string);
    }
}

String serializeDate(Date date) {
    return this.defaultFormat.toString(date);
}

Can you guess the reason already?
Turns out that when defaultFormat.parse fails with a ParseException, the date (Nov 16, 2023, 10:32:43 AM) is then parsed with ISO 8601 (e.g. 2022-09-27 18:00:00.000), which tries to see Nov as the year number and fails, as indeed Nov is an invalid number.
Why does defaultFormat.parse fail though?

I had my suspicions at this point, and after a bit of searching, they were confirmed.
That “default” format changed between Java 8 and Java 9, causing dates that were written from Java 8 to not-parse on Java 11, and the other way around.
The change, in my case, was from MMM d, yyyy, h:mm:ss a to MMM d, yyyy h:mm:ss a. A comma was removed after the year signifier.
Something had to be done.

Past wisdom

If I could go back to when the project was just written and the DB was empty, I’d enforce a single explicit date format throughout the codebase.
Granted, not using the default format is a bit of a chore. I don’t really care about how the date looks, as long as it’s human-readable and machine-readble. However, one of the habits I have is:

DateTime handling has a lot of sharp edges. Therefore, be as boring as possible when handling them

This is by no means an original thought. I’m certain many people reached the same conclusion, and in the same way - cutting themselves on something that could have been easily avoided. For the less-scarred, here are some examples for things that can wrong with dates, but you might not see them coming:

  1. Storing “naive” datetimes (no timezone) and having them passed between machines in different timezones, causing them to be treated as points in time
  2. DST and the reoccuring hour it brings at fall, making 01:30 happen twice
  3. In PHP, yyyy representing “2020” correctly, but one year later showing “2121”, as yy is a two-digit representation of the current year (are you looking for YYYY?)
  4. American and non-American date parsing (How would you read “03/07/22”? Which one is the month and which is the day?)
  5. Old timezone databases not having up-to-date DST times, causing an offset when doing date arithmatic on legacy machines
  6. The slight difference between nanoseconds and microseconds (1 micro = 1000 nano)

There are surely many more, complete with horror stories about discovering them at 3 am.

I might write more about what I’m doing to make DateTime handling more boring and therefore safer, but for this instance, settling on an explicit date format (no matter which one) would have saved me some trouble.

Back to the story

I currently had a different kind of problem - I already had a DB full of Java8-default-serialized dates.

Iterating over all of them and converting them to the “correct” format while the server is down is inviting a lot of pain, so no go.
The next reasonable path would be to centeralize all of our GSON callsites and set them explicitly to the Java8 datetime string for de/serializing, so I tried that. Much to my chagrin, I found that this broke other callsites, that were appreantly running on slightly different servers where the default was yet again different. This means that there was no single format string that covered all of our usecases. What do?

My final approach was replacing all of the GSON de/seralizers in our codebase to a bespoke one I built, that had a custom de/serializer for dates. It went approximately like this:

List<DateFormatter> readDateFormats = Arrays.asList(
    new DateFormatter("MMM d, yyyy, h:mm:ss a"), // The "main" formatstring I found from Java8
    new DateFormatter("MMM d, yyyy h:mm:ss a"), // Same for Java11
    new DateFormatter("yyyy-MM-dd'T'HH:mm:ss.SSSZ"), // Some format I like
    ... // All of the other formatstrings I found in our logfiles
);

DateFormatter writeDateFormat = new DateFormatter("MMM d, yyyy, h:mm:ss a");

Date deserializeDate(String string) {
    readDateFormats.stream().map(df -> {
        try {
            return df.parse(string);
        } catch (ParseException e) {
            return null;
        }
    }).filter(d -> d!=null).first();
}

String serializeDate(Date date) {
    return writeDateFormat.toString(date);
}

This solution has the following niceties:

  1. I can move the code to running under Java 8 or 11, and it won’t change its behavior.
  2. There is an explicit list of DateTime formats we support. If in the future we switch from GSON to another JSON de/serializer, we’ll know what we need to support and test against
  3. If I dicsover another DateTime format I need to support, I have a single place to add it in
  4. Changing the DateTime format we write is done in a single place and easily revertable (dates written in the new format can be easily supported even if we decide to not use it after a week)

We’re now happy campers (as far as JSON date parsing is concerned), future-proof and past-proof date-wise, and no one tries treating Nov as a number.