Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

How not to implement updaters

by afoken (Chancellor)
on Sep 30, 2022 at 21:25 UTC ( [id://11147189]=perlmeditation: print w/replies, xml ) Need Help??

Every two weeks, I switch from embedded developer to network and server administrator to keep our network and servers at work up and running. Today, updating our issue/requirement/test tracking software was on the plan. We have four virtual machines, each running one instance of the software. I won't state its name, and I will neither confirm nor deny any guess. But let's say the manufacturer has recently demonstrated in that their idea of forcing their clients to use the cloud variant of their software instead of local servers might not be the best idea. Users don't like having years of work deleted from the cloud servers, without a way to undo that quickly and completely.

Experience from previous updates has taught me to make a full backup of the entire VM before updating. So the day started with shutting down all four VMs and creating copies of their harddisk image files. Just to be sure. The VMs are relatively small, just a bare-bones installation of Debian plus a database plus the bugtracker software, so that four extra copies of the HDD images don't matter much.

I planned the entire day for the update, expecting some trouble with the first VM to learn about the new issues during the update, and then be able to update the three other VMs much faster, knowing what issues to expect. So I was absolutely not surprised that the first update went bonkers.

Act One

The update installer did created some zip files of the existing installation (don't hope to be able to recover from a broken update using those zip files), then removed the entire old version of the bugtracker software and unpacked the new version. "Do you want me to overwrite some.freaking.dll in the program directory?" Sure, why not? If the installer wants to overwrite what was unpacked seconds ago, let it do so. I have a good HDD image. A few moments later, it started the web server and pointed me to http://localhost:someport/. No, that web interface does not work in lynx or links, we are running a server, not a point-and-shoot adventure game. But the web server is really listening on all interfaces, so I can connect using Firefox on my PC. After several minutes of the old "don't blink, you might miss the progress bar moving another pixel" game, the browser shows the well-known "oops, something went wrong" page.

"We can't talk to the database." Well, the old version could. The old version had a database config file stating that we use a really exotic database. You probably never heard of it. It is called MySQL. Right out of the Debian package (so it is actually MariaDB). After some clicking on the eror page, you end at a wiki page of the manufacturer, which tells you to download a MySQL driver from a third party page. Yes, I really know that issue, and I should have thought about it, because it happened with every single update so far. It must be incredible hard to parse the database config file from the updater and instruct the admin right from the updater to download and install that driver BEFORE playing the waiting game. And it must be absolutely impossible just to bundle the driver like the tons of other crap that come with the software.

So, copy the driver file (it really is just a single file!) to /opt/crap/crap/crap/lib/, restart the server, play the waiting game again. "Oops, something went wrong." Yes, sure. "We can't detect the database version." I could not care less. "We just discarded your old startup configuration, here is a link to our wiki how to fix that." Oh well, it's just fine-tuning of how much memory the bugtracker wastes. Defaults are fine for now. "There is an expired license installed, you are only allowed to update to versions that were released before that license expired." What? "Click here to buy a new license, click here to enter the new license code." There is no way to bypass that.

I share the administration of the bugtracker with a coworker. She does the high-level stuff (workflow, addons and so on), I care about OS, database, network, backup, and basic installation. She told me that we don't actually use that license. The license is not for the bugtracker itself, it is for a component that wasn't even installed in the old version. The expired license is just garbage data, we don't use that component, we don't need that component. It once was installed, but nobody bothered to delete the license code.

To make matters worse, there is no way to delete the expired license, or just tell the installer that we are willing not to be able to use the unlicensed component. At this point, you can either pay a lot of money to renew a license for a component that you don't want and don't need, just to get past that error screen, or shut down the VM and copy the backup copy of the HDD image over the actal HDD image. I did the latter.

Act Two

Restart the VM, remove that left-over license code, redo the update installation, this time copying the database driver before starting the webserver. "You were updated". No, the updater managed to do its job of updating the bugtracker. "Oh, and by the way, we are just rebuilding our search index. Because, you know, we can't search in the database." Actually, the last sentence was not displayed. But you have to wait for the index rebuild job has finished before you can continue.

Well, the updater did not manage to do its job. "There are this 20+ apps that won't work for whatever reason." Good, let's see if the bugtracker does work at all. The personalized overview page displays fine, but where is the navigation bar? It's gone. You can't log out. You can't gain admin privileges. You can't navigate anywhere. Let's open an existing issue. "500 Internal Server Error - click here to see a long, useless stack trace and a random number that will identify this problem". Some other attempts of navigating elsewhere also ended in that 500 page. Well, that did not go well.

Half a day has passed, and we just managed to kill the first bugtracker VM twice. Or, to be precise, watch it commit suicide. Guess what? Shutdown, copy the backup once more over the actual HDD image, and retry a third time.

Act Three

My coworker thought that one of that many add-ons that she installed might be responsible for the trouble. (I don't know why we need 20+ addons, we use the core functions, plus an add-on for requirements, plus one add-on for tests, plus one add-on for making the search function work properly.) So she decided to clean up the mess, uninstall everything not needed, including that expired license.

It turns out that not everything uninstalled cleanly. "Something went wrong that you don't need to know. But if you really want to know, here is a link to an assistent that will tell you that we wrote some stack traces to one of the many log files." A 16 MByte log file. 390 kByte of which were created during the hour or so she tried to get rid of some garbage.

Well, shut down the VM, make a second copy of the HDD image just to have a slightly cleaner state to work from. Redo the update, again copying the database driver. After the waiting game, I'm greeted by the same "You were updated" screen, and only three add-ons are inoperable now. A few clicks later, I once again get the overview page. Almost any click gets me either to a much uglier 500 page than before, or to the pretty 500 page. "Click here to download an archive with all relevant data you can mail to our support." Click - "500 Internal Server Error". Yes, you can't even download the crash report archive.

VM suicide number three. After a short discussion, we decided to roll back to the very first backup I created in the morning. Copy the backup once again over the actual HDD image, start the VM again. Nearly eight hours have passed. We did not even try to update the three other VMs, we just started them in their old state.

Epilog

We wasted an entire day trying to update the software. It should be so simple. Run the update installer, add the database driver that the manufacturer does not bundle, watch the system update itself, run the new version. Or, if something is critical and might cause touble, get a good error message from the update installer BEFORE f-ing up the entire system.

It is possible. I know it, because my main job is software development. It takes testing, and during testing and development, you (as the developer) expect things to go horrible wrong. That's why VMs are so great. One click and you are back to a known state that you can fail to update again, and again, and again, until the updater just works or stops before damaging the system. With embedded systems, reverting to a known state is not always that easy, but even there, it is possible to make updates just work or abort before things go wrong.

I don't really want to know why the updater managed to kill our system three times, I just want it to do its job. Luckily, I'm on vacation now, and my coworker will contact the manufacturer of this crappy software. After my vacation, we will see how far she got.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re: How not to implement updaters
by choroba (Cardinal) on Sep 30, 2022 at 21:40 UTC
    The manufacturer announced the transition from on-premise to the cloud more than a year ago. Why is there no startup producing a non-cloud replacement? They'd make a fortune.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      The manufacturer announced the transition from on-premise to the cloud more than a year ago. Why is there no startup producing a non-cloud replacement? They'd make a fortune.

      It's a closed source product, so you can't simply fork it and keep maintaining a local-server version. (That would be great.)

      So that startup would need to re-invent the wheel AND get a sufficient number of users out of the manufacturer's cloud lock-in. That would at least require a very smooth migration procedure and very attractive pricing. That startup would need a lot of very competent and very efficient developers plus a lot of money to acomplish that.

      Speaking of pricing, the manufacturer used tactics that you would expect from a drug dealer, and they worked much too well: "Hey, want a bugtracker? It's almost free, just a few bucks for the first few users. And a few bucks for the requirement tracker for the first few users. And the same for the test tracker." One more licensed user, and the price explodes. To be fair, the next level after "a few users" is/was "a big company".

      My boss did some math a few years ago, and found that the "few users" license should be sufficient. It just barely fits now. And we are probably not the only ones who relly wanted a license level between "a few users" and "big company".


      The cloud. For us, it does not look like a pretty fluffy cloud (like on the WinXP default desktop), it's more like a thunderstorm coming closer.

      We did a lot of paperwork for our quality management (which we must do to legally develop our products) to use exactly this tracking software, and all of our active projects are tracked in that software. Porting that to any other software does not only require a smooth migration, but we also would need to redo the paper work.

      We will need to upgrade, to almost the latest version. Requirement and test tracking aren't supported the newest versions, newer requirement tracking versions are only available in the cloud, test tracking seems to be dead.

      So we will need to migrate our data. Issues, Requirements, and Tests, either to the not-so-well managed cloud, or to some other software. It does not have to be done right now, but probably within the next two years.

      I'm sick and tired of the software, and luckily I'm not the only one. It is a resource hog, it is closed source, updates are a nightmare, the web interface can't handle big screens very well, it absolutely can't handle using more than one browser window, and there are many other annoyances. We want to get away from it, probably to some open source (and ideally free-as-in-beer) software, or at least a software that has resonable, non-drug-dealer license steps and a working updater.

      Also, while the software can link issues, requirements, and tests very tightly, we actually do not use that feature very much. Ideally, we would, but in reality, there is the issues world of a project, and the requirements and tests world of a project. Both are pretty much isolated, and our actual workflow works fine this way. So, we already discussed using two different systems for the two worlds, one pure bugtracker (e.g. redmine, bugzilla, ...), and one that tracks requirements and tests. Of course, switching to another software requires a lot of paperwork (plus migration tools).

      We (the developers and our boss) will have some very interesting discussions in the future.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: How not to implement updaters
by afoken (Chancellor) on Nov 11, 2022 at 20:24 UTC

    It's admin day again.

    I tried to update the bugtracker one more time, with a new idea from who-knows-where. "Just delete that plugin in the filesystem and re-install it later via the user interface" said the knowledge base article. That should cure some problem we did not have last time I tried. Guess what: After some wasted hours, the bugtracker is once again f-ed up beyond repair, restore the backup.

    Completely unrelated, systemd running as process 1 (a.k.a. init) on our fileserver decided to crash, complain loudly on the console how evil the world is, disconnected from /dev/initctl and dbus, and made the entire server refuse to reboot. Yeah, stuffing everything and the kitchen sink into init sounds completely natural and sensible to about one human on earth. What could possibly go wrong? A sane init looks different: Re^14: CPAN failed install.

    No, I won't change the installation to get rid of systemd. I don't want to be the only one who knowns how to work with our servers. Running a stock Debian guarantees that at least one coworker can work with the servers, and others may find solutions through Google.

    The trick to get the server to reboot even when systemd has wetted its pants is to sync, hope for the best, and run systemctl --force --force reboot. Almost as good as pressing the reset button while the HDD LED still blinks. Well, the fileserver seems to have survived. ext4 is pretty robust. And if something broke unnoticed, I have a good backup.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Completely unrelated, systemd running as process 1 (a.k.a. init) on our fileserver decided to crash, complain loudly on the console how evil the world is, disconnected from /dev/initctl and dbus, and made the entire server refuse to reboot.

      I feel your pain. There have been some terrible, terrible IT decisions made over the last quarter century but IMHO systemd takes the absolute biscuit. On systems where I have any say in the matter we do not and never will run systemd because I, apparently unlike many others, prefer my systems to boot reliably. It should be no surprise to anyone that Poettering now works for M$ (and still tries to infect superior systems with his crud).


      🦛

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11147189]
Approved by choroba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2024-03-28 11:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found