Once, a long time ago, I used to have a consulting gig in some big enterprise-y company. It had a lot of unique challenges, being disconnected from the internet (for security reasons, I was told) and therefore having practically everything in-house.
I spent my time with not only the proto-devops-people (this was before devops was cool), but also with the hardcore sysadmins.
They were in charge of the un-sexy infrastructure that kept the organization ticking. I’m talking about E-Mail, Active Directory, DNS, workstation provisioning etc. We used to joke that they were the IT equivalent of sanitation - when everything worked, no one knew you were there, toiling. When things broke, they suddenly remembered who’s responsible for this thing and where to find them.
Summer was coming, and I inquired with the person in charge of WSUS (Windows Server Update Services, the Microsoft-blessed way to distribute Windows updates in offline environments) whether they’ve deployed the latest DST-related patches (the DST schedule was modified that year). They replied that no, and in fact they’d like some help in ensuring DST transitioning is disabled on all workstations/servers, as this is how things work here.
Short break - how is DST implemented
Unlike your microwave or your alarm clock radio, when you enter Daylight Saving Time, your computer’s internal clock doesn’t actually go forward or back one hour. Instead, your timezone (GMT+X or UTC+X) is changed, so the time you see as a user is different.
This technique has several advantages over moving the actual clock, mainly that when talking with other computers (that may be in different geographical regions), they still agree on what time it is right now.
This also means that software running on a computer while it’s changing its timezone doesn’t experience a time skip or a repeat, as long as it’s working internally in UTC time.
Back to the story
After being internally in shock for a bit, I said that sure, no problem. We did some research on the relevant registry settings for disabling DST transition, and made a plan on what to actually do come summer time (more on that later).
I casually asked why we’re not using DST, and was answered that it was the devops’ requirement. Following up with the devops team (and digging a bit myself) helped me understand the situation better. Buckle up.
The in-house development team built a .NET-based software that was in charge of handling, amongst other things, delivery times of equipment into the company’s warehouses. Each delivery window / event had a start/end datetime that had to be shared with the truck drivers, warehouse operators, security etc.
Come summer, the support crew for this software would encounter a persistent weird problem - all shipping events advanced exactly one hour after saving them.
You create an event for tomorrow 08:00, save it, and upon viewing it it’s now at 09:00. You edit it, save it again without touching anything, and it’s now at 10:00.
Obviously this is insufferable, but luckily someone found out that when disabling DST on every machine involved, things went back to normal.
We have an expression that loosely translates as “there’s nothing more permanent than a temporary fix”, and so it was. DST was forbidden ever since in that company’s environment, otherwise chaos would ensue in the shipping schedule.
The problem revealed
I had some time to spare and was dying to know what’s going on, so I did some research. Said research involved a bit of code-reading, but mostly talking to the dinosaurs in both devops and coding to understand what the system is doing. My results are as follows:
- The shipping software persisted its data in a database, which is perfectly reasonable.
- The data was kept in-DB as giant XML strings. While this sounds ridiculous now, keep in mind that this was before NoSQL, and the organization’s developers had no table structure modification access. This was their way of getting a flexible record structure they can modify on their own.
- The XML was de-serialized back into .NET objects by a standard .NET library. However, due to some additional permission checks and validation, the serialization (converting classes into XML strings) was built in-house.
Since this was long ago, I didn’t have good read access to .NET’s serialization logic. What I did have, though, is the ability to write a stupid “Console Application” and see how the library behaves.
I noticed that the .NET library was treating all datetime strings as UTC+0, so when loading such an object into a timezone-naive datetime object, it would use Windows’ timezone logic to convert it into a local one.
The in-house-written logic that was written to emulate the .NET library, however, did no such thing. Instead, it subtracted 2 hours (Israel’s regular UTC offset) from the datetime object, and stuck the result in the XML. Easy. During non-DST times, this would work great.
Any event at 08:00 would be persisted in the DB as 06:00, and when deserialized would be converted back into 08:00.
However, when DST was in effect, 08:00 would be serialized as 06:00 (-2 hours), but then deserialized back as 09:00 (as the DST timezone of Israel is UTC+3), which caused the mysterious offset people complained about.
How does an org run without DST
In short - not great.
Although the sysadmins couldn’t use the DST mechanism in Windows, people still wanted the computers’ clocks to match their wall clocks, so we had to actually move the machine’s internal clock.
Worse, the company was 24/7 operational, so we had to do the transition quickly, and at nighttime (the clocks shift from 01:00 to 02:00, or from 02:00 to 01:00).
The plan we worked out together was as follows:
- A week before - prepare emergency local credentials for all important servers, in case network authentication fails due to time differences.
- A day before - Increase the tolerance of all systems that limit ticket validity (back then it was Kerberos, a modern example might be JWT) to at least 1 hour.
This allowed two machines to communicate, even if the first clock was moved already and the second clock was not.
- 1 hour before - Stop every task scheduler we cared about, fearing it’d double-execute tasks (or skip executing them).
- At the time of the switch - manually shift the clock of the NTP source of truth
- Execute a script that iterates over all of our important servers (Active Directory Domain Controllers, SQL servers, application servers) and has them immediately sync their clock from the NTP master.
- Execute the same script over all servers and workstations in the company, skipping ones that are turned off.
- Return any disabled services from before, after verifying their servers have the new time
- Manually handle stragglers that for some reason didn’t get the new time set. Not only servers, but also workstations of VIP personnel that had the sysadmin’s manager’s mobile and were not afraid to use it
- The morning after - Disable local credentials, return time tolerances of security mechanisms to sane borders.
Overall a very exhausting process, and I’m happy I didn’t get to witness it too many times.
When I left my gig at this company, DST was still being disabled.
The reasoning that I got is that management preferred paying (whether in overtime or lost productivity) for the sysadmins to do the above ritual twice a year, rather than risk breaking the delicate equilibrium of the application’s DB persistence.
At the time, I heard a bit of resentment from the sysadmins. They felt like it’s easier for the company to compromise their workflow and system stability rather than the software developers’. I did my best to explain that it’s not that the devs were enjoying the situation, or even unwilling to remedy it. It’s the organization that chose to not invest the time in ironing the process out, and instead preferred to have the devs’ time spent in developing new features, or solving bugs that didn’t have such a well-established workaround.