iKnowNothing has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have written a simple script to read a tab-delimited file and write out a few selected columns to another tab-delimited file. Although my script is working OK, it seems to take more time than it should. It took 44 seconds to do its thing on a 4.22 MB file (2796 lines). Any insight would be greatly appreciated. Here's the code:
while(<INFILE>) { # get the current line and split into it's columns @Line = split /\s+/, $_; #print the selected columns to the output print OUTFILE join("\t",@Line[@ColumnNumbers]),"\n"; }
The @ColumnNumbers array has been defined previously, and would look something like: (1,10,32,69,200,291) UPDATE: The problem turned out not to be related to the code above. Thanks for the input though. Turns out I was running another algorithm every time through the loop that I thought I was only running the first time.

Retitled by davido from 'Why is this so slow?'.

Replies are listed 'Best First'.
Re: Optimizing slow restructuring of delimited files
by davido (Cardinal) on Jan 25, 2005 at 18:01 UTC

    You don't have any seriously slow code in that snippet. One comment though; the snippet isn't doing what you stated to be your objective. Instead of splitting columns of a tab-delimited file, it's splitting on any amount of any kind of whitespace. I'm not sure that's your objective.

    For example, if your input string contained:

    Hello world.\tThis is Dave

    Your snippet would create elements in @Line like this:

    @Line = ( 'Hello', 'world.', 'This', 'is', 'Dave' );

    You stated that the objective was to load @Line with:

    @Line = ( 'Hello world.', 'This is Dave' );

    Dave

Re: Optimizing slow restructuring of delimited files
by periapt (Hermit) on Jan 25, 2005 at 18:10 UTC
    You didn't give much to go on but my first hypothesis would be disk I/O. If the file size is small, you could try assigning the entire file to an array and parsing that. Something like
    @filetoread = <INFILE>; # read in file all at once my $linestooutput = ''; # place to save output until the en +d foreach (@filetoread){ @Line = split /\s+/; #split defaults to $_ $linestooutput .= join("\t",@Line[@ColumnNumbers])."\n"; } print OUTFILE $linestooutput; # write output # or even shorter @filetoread = <INFILE>; $linestooutput .= join("\t",(split /\s+/)[@ColumnNumbers])."\n" foreac +h (@filetoread); print OUTFILE $linestooutput;
    I'm not sure about the speed impact of interpolated splices? I don't imagine that is the issue but you could try something like this
    while(<INFILE>) { # get the current line and split into it's columns @Line = split /\s+/, $_; #print the selected columns to the output my $outline .= @Line[$col]."\t" foreach my $col (@ColumnNumbers); print OUTFILE $outline,"\n"; }


    PJ
    use strict; use warnings; use diagnostics;
Re: Optimizing slow restructuring of delimited files
by BrowserUk (Patriarch) on Jan 25, 2005 at 18:43 UTC

    You program would probably run a little more quickly if you used lexical (my) variables--assuming you are not already doing so.

    You may also gain a little performance from avoiding the join by setting $, = "\t"; and just printing the splice.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Optimizing slow restructuring of delimited files
by Aristotle (Chancellor) on Jan 25, 2005 at 23:22 UTC

    In such restricted cases awk might or not be a better bet.

    awk -F'[[:space:]]+' 'BEGIN { OFS="\t" } { print $3, $6, $7 }' infile +> outfile

    Makeshifts last the longest.

Re: Optimizing slow restructuring of delimited files
by holli (Abbot) on Jan 25, 2005 at 19:34 UTC
    Here is a one-liner for you:
    shell> perl -na -F\s+ -e "BEGIN{@S=(2,5)} print join(qq:\t:, @F[@S]), +qq:\n:" infile>outfile
    @S contains the columns to select. But this one should be faster
    shell> perl -na -F\s+ -e "print join(qq:\t:, @F[1,2]), qq:\n:" infile> +outfile

    holli, regexed monk
Re: Optimizing slow restructuring of delimited files
by NateTut (Deacon) on Jan 25, 2005 at 19:30 UTC
    You don't mention which platform you're on or how you are executing the script, but I have noticed a significant startup delay with ActiveState perlapp executables.