Syntactic Confectionery Delight | |
PerlMonks |
How not to implement updatersby afoken (Chancellor) |
on Sep 30, 2022 at 21:25 UTC ( [id://11147189]=perlmeditation: print w/replies, xml ) | Need Help?? |
Every two weeks, I switch from embedded developer to network and server administrator to keep our network and servers at work up and running. Today, updating our issue/requirement/test tracking software was on the plan. We have four virtual machines, each running one instance of the software. I won't state its name, and I will neither confirm nor deny any guess. But let's say the manufacturer has recently demonstrated in that their idea of forcing their clients to use the cloud variant of their software instead of local servers might not be the best idea. Users don't like having years of work deleted from the cloud servers, without a way to undo that quickly and completely. Experience from previous updates has taught me to make a full backup of the entire VM before updating. So the day started with shutting down all four VMs and creating copies of their harddisk image files. Just to be sure. The VMs are relatively small, just a bare-bones installation of Debian plus a database plus the bugtracker software, so that four extra copies of the HDD images don't matter much. I planned the entire day for the update, expecting some trouble with the first VM to learn about the new issues during the update, and then be able to update the three other VMs much faster, knowing what issues to expect. So I was absolutely not surprised that the first update went bonkers. Act One The update installer did created some zip files of the existing installation (don't hope to be able to recover from a broken update using those zip files), then removed the entire old version of the bugtracker software and unpacked the new version. "Do you want me to overwrite some.freaking.dll in the program directory?" Sure, why not? If the installer wants to overwrite what was unpacked seconds ago, let it do so. I have a good HDD image. A few moments later, it started the web server and pointed me to http://localhost:someport/. No, that web interface does not work in lynx or links, we are running a server, not a point-and-shoot adventure game. But the web server is really listening on all interfaces, so I can connect using Firefox on my PC. After several minutes of the old "don't blink, you might miss the progress bar moving another pixel" game, the browser shows the well-known "oops, something went wrong" page. "We can't talk to the database." Well, the old version could. The old version had a database config file stating that we use a really exotic database. You probably never heard of it. It is called MySQL. Right out of the Debian package (so it is actually MariaDB). After some clicking on the eror page, you end at a wiki page of the manufacturer, which tells you to download a MySQL driver from a third party page. Yes, I really know that issue, and I should have thought about it, because it happened with every single update so far. It must be incredible hard to parse the database config file from the updater and instruct the admin right from the updater to download and install that driver BEFORE playing the waiting game. And it must be absolutely impossible just to bundle the driver like the tons of other crap that come with the software. So, copy the driver file (it really is just a single file!) to /opt/crap/crap/crap/lib/, restart the server, play the waiting game again. "Oops, something went wrong." Yes, sure. "We can't detect the database version." I could not care less. "We just discarded your old startup configuration, here is a link to our wiki how to fix that." Oh well, it's just fine-tuning of how much memory the bugtracker wastes. Defaults are fine for now. "There is an expired license installed, you are only allowed to update to versions that were released before that license expired." What? "Click here to buy a new license, click here to enter the new license code." There is no way to bypass that. I share the administration of the bugtracker with a coworker. She does the high-level stuff (workflow, addons and so on), I care about OS, database, network, backup, and basic installation. She told me that we don't actually use that license. The license is not for the bugtracker itself, it is for a component that wasn't even installed in the old version. The expired license is just garbage data, we don't use that component, we don't need that component. It once was installed, but nobody bothered to delete the license code. To make matters worse, there is no way to delete the expired license, or just tell the installer that we are willing not to be able to use the unlicensed component. At this point, you can either pay a lot of money to renew a license for a component that you don't want and don't need, just to get past that error screen, or shut down the VM and copy the backup copy of the HDD image over the actal HDD image. I did the latter. Act Two Restart the VM, remove that left-over license code, redo the update installation, this time copying the database driver before starting the webserver. "You were updated". No, the updater managed to do its job of updating the bugtracker. "Oh, and by the way, we are just rebuilding our search index. Because, you know, we can't search in the database." Actually, the last sentence was not displayed. But you have to wait for the index rebuild job has finished before you can continue. Well, the updater did not manage to do its job. "There are this 20+ apps that won't work for whatever reason." Good, let's see if the bugtracker does work at all. The personalized overview page displays fine, but where is the navigation bar? It's gone. You can't log out. You can't gain admin privileges. You can't navigate anywhere. Let's open an existing issue. "500 Internal Server Error - click here to see a long, useless stack trace and a random number that will identify this problem". Some other attempts of navigating elsewhere also ended in that 500 page. Well, that did not go well. Half a day has passed, and we just managed to kill the first bugtracker VM twice. Or, to be precise, watch it commit suicide. Guess what? Shutdown, copy the backup once more over the actual HDD image, and retry a third time. Act Three My coworker thought that one of that many add-ons that she installed might be responsible for the trouble. (I don't know why we need 20+ addons, we use the core functions, plus an add-on for requirements, plus one add-on for tests, plus one add-on for making the search function work properly.) So she decided to clean up the mess, uninstall everything not needed, including that expired license. It turns out that not everything uninstalled cleanly. "Something went wrong that you don't need to know. But if you really want to know, here is a link to an assistent that will tell you that we wrote some stack traces to one of the many log files." A 16 MByte log file. 390 kByte of which were created during the hour or so she tried to get rid of some garbage. Well, shut down the VM, make a second copy of the HDD image just to have a slightly cleaner state to work from. Redo the update, again copying the database driver. After the waiting game, I'm greeted by the same "You were updated" screen, and only three add-ons are inoperable now. A few clicks later, I once again get the overview page. Almost any click gets me either to a much uglier 500 page than before, or to the pretty 500 page. "Click here to download an archive with all relevant data you can mail to our support." Click - "500 Internal Server Error". Yes, you can't even download the crash report archive. VM suicide number three. After a short discussion, we decided to roll back to the very first backup I created in the morning. Copy the backup once again over the actual HDD image, start the VM again. Nearly eight hours have passed. We did not even try to update the three other VMs, we just started them in their old state. Epilog We wasted an entire day trying to update the software. It should be so simple. Run the update installer, add the database driver that the manufacturer does not bundle, watch the system update itself, run the new version. Or, if something is critical and might cause touble, get a good error message from the update installer BEFORE f-ing up the entire system. It is possible. I know it, because my main job is software development. It takes testing, and during testing and development, you (as the developer) expect things to go horrible wrong. That's why VMs are so great. One click and you are back to a known state that you can fail to update again, and again, and again, until the updater just works or stops before damaging the system. With embedded systems, reverting to a known state is not always that easy, but even there, it is possible to make updates just work or abort before things go wrong. I don't really want to know why the updater managed to kill our system three times, I just want it to do its job. Luckily, I'm on vacation now, and my coworker will contact the manufacturer of this crappy software. After my vacation, we will see how far she got. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Back to
Meditations
|
|