mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks
I am writing a program that creates normalized "keys" for database values, which are better for searching and sorting (not that relevant for my question). It is a fairly complex program, which is part of a large infrastructure of Perl modules. This specific development uses about 5 modules, and runs from the server.

My program (I think) is pretty fast. I can get normalized "keys" for about 40 values per second. This means that I can normalize a 1000 values in 25-30 seconds (depends if I need to update or add new keys).
The problem is that the table that I need to normalize is very large (more that a million entries), and therefore doing all the normalizations from scratch takes about 8 hours.
My question is, and this is more of a general question about Perl than about my specific development, is where am I losing time?
What parts of Perl are known to be a bit slower or less efficient than others? And what parts of Perl are super fast and should be used more?
I am using Perl 5.10, and am calling the DB about 5 times per run (avg DB call is about 0.0005 seconds).
Any ideas or advice will be most welcome.
Thanks
Guy Naamati

UPDATE: After using NYTProf I found out that am dynamically creating a new instance of one of the modules I use each time I normalize. Hopefully by creating the instance in the start of the program I can maybe make my program run 10% faster. Thanks for the advice, and any other (ideas|tips) will be welcome (this has turned into a bit of an interesting discussion).

UPDATE 2: After analyzing the profiler, it seems that the actual act of normalization is the main "hotspot". When I use hard coded values instead of normalized ones the running time is about 6 times as fast. Since the normalization is being done by another program which probably can't change, there isn't too much I can do except create the normalization module just once and not many times (like I stated above).
Thanks for everybody's help!

I want to see people using Perl to glue things together creatively, not just technically but also socially
----Larry Wall

Replies are listed 'Best First'.
Re: Making my Perl program a bit faster
by moritz (Cardinal) on Jul 08, 2009 at 12:37 UTC
    The obvious advice is "profile your program, find hotspots, optimize those". Devel::NYTProf is a great profiler, I can really recommend it.

    My second advice is to try to let the database do as much work for you as possible.

    If your program processes one database entry at a time, you could also try to parallelize it.

    Update: There's also a document in the newest (not yet released) perl verions about performance, pod/perlperf.pod in perl.git.

      Thanks for the advice.
Re: Making my Perl program a bit faster
by marto (Cardinal) on Jul 08, 2009 at 12:32 UTC
      Not yet, but I will look into it.
Re: Making my Perl program a bit faster
by jethro (Monsignor) on Jul 08, 2009 at 12:49 UTC

    Your question is not only general it is too general. For example a regex can be fast but if you introduce lots of backtracking into it, it can be very slow.

    There is a chapter in the official perl book ("Programming Perl" by Larry Wall...) on efficiency (page 537 in my edition), which might have some useful tips for you

    I would suggest profiling your program. The "recommended by perlmonks" module of the moment seems to be Devel::NYTProf.

Re: Making my Perl program a bit faster
by graff (Chancellor) on Jul 08, 2009 at 16:31 UTC
    You said:
    I ... am calling the DB about 5 times per run...

    Does "one run" mean one command-line execution of the script? How many keys do you generate in one run?

    If you are doing lots of runs, part of the slowness will be at the level of the shell, having to build up and tear down a process for each run. Ideally, one run (lasting up to 8 hours or whatever) should minimize this sort of processing overhead.

    Apart from that, it's probably more a question of algorithm, and you haven't given us any clues on this. How complicated is the procedure to come up with "normalized keys"? How big does the script really need to be to do this?

    And how often do you have to come up with "normalized keys" for a million entries? (Does this need to be done repeatedly? If not, just run it for 8 hours and be done with it -- why worry about timing?)

      Does "one run" mean one command-line execution of the script?

      Each time I create a new "key" for a value is a run (perhaps I should have used a different word). Therefore I am doing over a million runs.

      Apart from that, it's probably more a question of algorithm

      Like I said, it's a fairly complex program that uses several different modules. Trying to explain the algorithm is pretty complicated. My intent was to get some clues and tips on how to run the program more efficiently, which I did, and will hopefully get some more.

      And how often do you have to come up with "normalized keys" for a million entries?

      Once per customer, since we're doing a version upgrade, and the "normalized keys" is a new feature. We have quite a few customers, so its fairly important.
        Each time I create a new "key" for a value is a run (perhaps I should have used a different word). Therefore I am doing over a million runs.

        It's not clear whether you answered my question. How many times does the shell load and execute your multi-module perl script? Once, or over a million times?

        If the latter, then I would strongly suggest that you refactor things so that you can generate a large quantity of keys in a single shell-command-line run -- running millions of processes in sequence (each one a presumably bulky script with a database connection and 5 queries) is bound to be costing you a lot of time. By doing a large quantity of keys in a single process, you might save a lot on DB connection, statement preparation, etc, in addition to OS process-management overhead.

        And/or maybe you can speed things up a bit by running multiple instances in parallel? (But then you have to make sure they don't interfere with each other, or overlap, causing redundant runs.)

        Once per customer, since we're doing a version upgrade, and the "normalized keys" is a new feature. We have quite a few customers...

        And I suppose each customer has their own particular data requiring their own distinct set of normalized keys? Obviously, any amount of commonality across customers should be exploited (keep results from one to use on another, if at all possible).

        Does that mean you start a Perl program for each key?

        If yes: don't. Do them all in one program; that way you avoid the cost of starting up the interpreter a few million times.

Re: Making my Perl program a bit faster
by salva (Canon) on Jul 08, 2009 at 23:20 UTC
    Generic questions just get generic answers! If you want to get useful help, post your code or at least a description of the process you follow to normalize the keys.

    Without context it is difficult to say, but 40 operations/seconds doesn't look very impressive unless you are performing quite complex operations.

    Profilers (as Devel::NYTProf) can be very helpful for finding hot spots in your program, but they tend to make you focus on small scopes that only will give you relatively small speed increases (typically < 30%).

    If you want to improve the speed of your program by orders of magnitude your first action should be to examine the algorithms you are using and to try to replace them with better ones.

Re: Making my Perl program a bit faster
by JavaFan (Canon) on Jul 08, 2009 at 12:55 UTC
    What parts of Perl are known to be a bit slower or less efficient than others? And what parts of Perl are super fast and should be used more?
    What a silly question to ask. One doesn't use parts of Perl because they are fast, one uses parts of Perl that solve their problem. It's like taking the train: the most important factor in deciding which train to take isn't the speed of the train, but whether it brings me to where I want to be.

    BTW, Perl doesn't have things that are "not fast" for the sake of being "not fast". They may be "not fast" because they do a lot of stuff. It would be silly to avoid them if the stuff they do is what you want to be done.

      ....um...er.... not quite or, at least, not always:

      s/(the most important factor in deciding which train to take isn't the speed of the train, but whether it brings me to where I want to be)/\1 when I want to get there./

      Sometimes, the [bicycle|car|plane] is the right choice (cf Moritz advice re letting the database do some of the work).

      Update:   AnomalousMonk (thanks!) notes: s/(...)/$1 when I want to get there./ vice s/(...)/\1 when I want to get there./ : capture variable is preferable to backreference in string interpolation (backreference generates warning).

      Hi JavaFan,
      First of all, if one's workplace uses Perl, then one would use Perl all the time, and indeed try to program it as efficiently as possible.
      Also, I am not a huge expert in this (the reason why I am asking this question), but like any programming language, there are things Perl is very good at (e.g. regexes and parsing files) and not so good at (hardcore mathematical computing if I'm not mistaken).
      The purpose of my question was to get a bit more info about this issue, and to gather a few tips (e.g. use a profiler).

      I think the issue of Perl and efficiency is an interesting one (in fact it might be a good idea for my next Perl Mongers lecture after I understand it a bit more), and not so silly as you might think.

      Cheers mrguy123
        Yes, but even if being "very good" at regexes means regular expression matching in Perl was fast, it still doesn't make sense to avoid "hardcore mathematical computing" Perl isn't good at in lieu of regexes if you have to do arithmetic. It's hard to do any non-trivial arithmetic with regular expressions, and unlikely to be more efficient than regular "hardcore mathematical computing".

        Even if your hammers are above average, and your screwdrivers aren't, it still doesn't make sense to hammer in screws.