in reply to Refactoring a large script

refactoring/improving/tightening/increasing the speed and efficiency of the program

I think you first need to ask yourself which goal you are pursuing. "Refactoring" is usually associated with removing duplicate code and generally improving the maintainability of your code. Efficiency is a different thing entirely (and sometimes opposite, as refactoring things into subroutines increases overhead).

For efficiency, it's critical to figure out real bottlenecks and here profiling tools will help. See Devel::DProf for examples. Once you've identified particular bottlenecks, work on those or post examples to Perl Monks if you're stuck. Don't overlook all the good Tutorials.

There are also some great reference books on these topics. The book that's almost written for your case is "Perl Medic: Transforming Legacy Code" by Peter J. Scott.

For refactoring, for the most part, just look for things that you do over and over again (cut-and-paste stuff) and try to isolate that into separate subroutines. If you're up to it, you could consider pushing some of it into separate modules. (See How a script becomes a module).

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Replies are listed 'Best First'.
Re^2: Refactoring a large script
by mdunnbass (Monk) on Jan 18, 2007 at 17:12 UTC
    Thanks xdg.

    I guess what I'm looking for is more efficiency than refactoring. In that sense, I've already pulled much of the redundant code into subs. So, it's more the overhead and speed which I want to tackle. And thanks for the book ref. I'll give it a look-see.

    Thanks,
    Matt

        Thanks for the links!

        I finished debugging the vast majority of the script at this point, and now I finally have it at a point where it will run to completion without getting stuck in a loop, or encountering an unrecoverable error of some kind.

        So, I just ran -d:Dprof, and I am pleased with the output:

        Total Elapsed Time = 1135.701 Seconds User+System Time = 770.4718 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 63.8 491.8 491.86 1 491.86 491.86 main::WEED 32.5 251.1 251.10 22012 0.0114 0.0114 main::SEARCHFASTA 2.10 16.20 16.209 164 0.0988 0.0988 main::GET_TEXT 0.71 5.460 5.460 1 5.4600 5.4600 main::INDEX_FASTA 0.40 3.089 3.089 164 0.0188 0.0188 main::ADD_SPAN 0.15 1.140 1.140 1 1.1400 1.1400 main::WEEDNUM 0.08 0.599 770.48 2 0.2994 385.24 main::MAIN 0.05 0.380 0.380 1 0.3800 0.3800 main::OVERLAP 0.05 0.350 0.350 1 0.3500 0.3500 main::CLUSTER 0.02 0.130 0.130 2 0.0650 0.0650 main::GET_ENDS 0.01 0.048 3.176 1 0.0482 3.1764 main::HTML_FORMAT 0.01 0.040 0.040 1 0.0400 0.0400 main::TABLE_IT 0.01 0.040 0.420 1 0.0400 0.4200 main::SORT_HITS 0.00 0.020 0.020 2 0.0100 0.0100 main::WEED_HEADERS 0.00 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN
        Obviously, a 20 minute run-time is far longer than I would have liked. But 96% of that run-time was in 2 subs. The SEARCHFASTA sub searches a file containing (in this run) a .5 gig file of DNA sequences, searching for every occurance of multiple different strings, and saving them to a large hash. So, in that case, it's going to be dependant on the size of the file being searched, and will be large no matter what I do.

        The WEED sub, which is taking 65% of the time, is the one I have to work on then. Its purpose is to organize the results of the SEARCHFASTA sub based on certain criteria, and that is taking it forever. So, I'll have to see what Ican do to tighten that up. I may post that to the Seeking Wisdom node in a bit.

        Thanks to everyone for their help and advice. I am taking much of it to heart, and will be keeping a lot of your tips in mind from now on.

        Thanks
        Matt