This is a followup on some recent posts of mine .. in case anyone new to Perl wonders if it is ever used for real work.

My web application handles document processing (conversion from PDF to XML), and it's currently running on a dozen servers, the busiest of which is used by two teams, one here in Toronto and the other in Mumbai, India. The work day starts about midnight local time (930am in Mumbai) and goes to about 8 or 9pm, and lately we've been running 7 days a week to keep up with demand. If things go South, it lands in my lap. I hate pages at 4am, so I do what I can to avoid that situation.

Anyway, recently, I wrote about a problem I'd been having with a sleepy CGI, and eventually rediscovered the excellent tool strace thanks to some respondents. But that led me to my next problem, as to why the CGI would be freezing at

select(5, [4], [], [4], NULL
as strace was showing me. After some more research, I discovered this was the connection to the database. Could it be as simple as the CGI waiting for the database that was causing the problem?

Oh boy. The answer is yes, as it turns out. And the CGI was not 'going to sleep' -- it was continuing to run, but was patiently waiting for the database, not sleeping.

And the solution to the database slowdown (PostgreSQL, in my case) was the simple application of

ANALYZE VERBOSE DOCUMENTS;
and the response time for the main query went from 20 seconds to about 550ms. Yay!

Since I've seen this performance problem before, I'm now going to keep an eagle eye on the system today and find out how long it takes before the performance starts to drop, then set up a cron job to ANALYZE the suspect table again, most likely every four hours or so.

This is all a result of watching my system closely, something I think is extremely valuable in any Engineering job -- staying on top of the performance of your system is very important in my current situation of developer, supporter and maintainer.

The moral of the story is, follow the data (I know, it sounds like CSI). Processes very rarely 'go to sleep' unless they're told to. Where's the CPU time (or bandwidth, or your other resources) going? Follow that lead, and you'll find the answer to your problem.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re: Make sure you're solving the right problem
by jasonk (Parson) on Oct 13, 2006 at 16:26 UTC

    This is why big database applications need DBAs. And why you should be using autovacuum if you are running PostgreSQL versions prior to 8.1 (when the autovacuum features were added to the server, instead of being an external process).


    We're not surrounded, we're in a target-rich environment!
        This is why big database applications need DBAs.

      Granted. However, in small companies, one person gets to do many things: I've assumed the System Architect job, I'm the only Software Developer and Software Support person, and I also pretend to be the DBA.

      Right now I'm doing a daily VACUUM on the application database, and a weekly VACUUM on the whole database. Due to the success of ANALYZE, I will likely be doing that daily as well.

      Upgrading the database version is on my list -- the development system I have uses the fairly recent 8.0.3, but the Production system is .. ahem .. older than that.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Make sure you're solving the right problem
by exussum0 (Vicar) on Oct 13, 2006 at 17:53 UTC
    I had a similar issue once, where a program would mysteriously fail, mostly when I wasn't looking. What I did, for my case, was to bring up the perl debugger, and watch each line run against a copy of the funky system.

    Turned out some chucklehead used signals for detecting timeouts without ever resetting the signal properly... so if the process was fast enough, the new signal handler would go away when the process terminated. Which was quite the case on a dev system.

    Use a perl debugger, watch the lines run one by one, and I'd get a signal handler go BOING!.

      Absolutely. The Perl debugger is a bit cryptic (OK, it's *very* cryptic), but it shines an intense bright light on what the code's doing. I'm a fan.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Make sure you're solving the right problem
by monarch (Priest) on Oct 13, 2006 at 19:55 UTC
    External databases are great things, but they are not a panacea. It is nice, at university, to think of a database as that separate black box that data goes into and out of and is all nicely managed for you.

    Having seen databases used in different companies (such as Sybase, Oracle, PostgreSQL, MySQL, and Informix) and used a few, although being no DBA myself, I've come to realise that each DB behaves in different ways and every single one needs a lot of hand holding.

    It pays to keep a close eye on a new system. The problem is that your use of a database can be just as much of a problem as the application you just wrote and (rightly so) assume to be the most likely source of the fault compared to this off-the-shelf application that's a mature product.

    I liked your post, have upvoted it, as I think it's a great learning experience any time you realise a fault is somewhere you didn't think before. Well done!

Re: Make sure you're solving the right problem
by Mutant (Priest) on Oct 14, 2006 at 11:55 UTC
    Nice post.

    I agree about sometimes feeling like a CSI when trying to track down performance issues or particularly nasty bugs. For some reason, I actually enjoy going through log files step by step trying to figure out what bizarre condition has triggered all hell to break loose.

    Of course, that just emphasises the importance of proper logging, especially in large complex systems. It's not easy to isolate the problem if you don't have the 'evidence' to find out what might be causing it.

      Absolutely agree about log files. SysAdmins complain about the space they take up, but there's extremely valuable forensic information in there.

      Having said that, half a dozen gigabytes of logs going back several months is probably excessive, unless you're trolling for something exceedingly rare.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds