Reinaldy Rafli

Incident lesson learned: Be conservative of upgrades, understand the implications

Sep 15, 2024 • 457 words

I’m hoping this is just me, but as someone who’s keen to using latest and greatest version of a software, the latest version, that is. I’ve developed a habit of frequently upgrading the softwares on my system. That includes the Linux server on my company that’s within my reach. For internal services, upgrading frequently is not a problem (most of the time), but for external services that we built ourselves, it’s… a problem.

This will be a short post that will remind you to check every changelog and upgrade requirements before modifying your system.

If you ever work on the infrastructure or devops team, you’ll soon understand why most people are using CentOS or RHEL based Linux the most. The short answer is they offer very long LTS (long term support) versions of the OS. You can see on the End of Life date of RHEL that it offers at least 13 years of support. RHEL 9 was released on 17 May 2022, yet the Extended Life Cycle support ends on 31 May 2025. So for 13 years, you won’t be needing to upgrade your distro version. The only upgrades you’ll need is to make sure you don’t face any security CVEs, even if you do, you’ll be alerted by them or something else and a patch version for that CVE will be available, without upgrading the distro version. The second most popular distro that people use on their server is Ubuntu, which offers at least 10 years of support. Ubuntu 24.04 was released on 25 April 2024, and the Extended Security Maintenance support ends on 25 April 2036.

But, what if we leave the distro version after 10 years? Well, it would be obsolete and at the end of that 10 years period, we’ll eventually need to upgrade the distro version. Since, upgrading after years of years of usage will obviously introduce a lot of breaking changes, we need to really remind ourselves that the system will potentially break.

This is why we should always read the changelogs and see what’s the most impactful change. The safest way is to create (or provision) a new machine with the newer distro version, and do a data migration, and switch over every other networking detail to the new system. I’ve seen a lot of people (including myself) that’s not doing that, all they did is directly upgrade the distro version within the same machine (or server). When a problem arises, you can’t simply rollback to figure out what’s causing the issue. You’ll waste hours of debugging time trying to get the system back to the previous state.

So, again: everytime you upgrade your system, make sure you read the changelogs and upgrade requirements.