Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

Thanks for the response to my earlier node parsing large text files today. I have written the following snippet, and I am wondering is there any way to make it run faster?
#!/usr/bin/perl use strict; use warnings; #text file about 70MB my $datasource ="C:\\somelargetextfile.txt"; my $i=1; open DATASOURCE, "$datasource" or die "Can't open $datasource: $!"; foreach my $line (<DATASOURCE>){ print "$i - $line"; $i++; } close (DATASOURCE);
Any help on speed improvement would be greatly appreciated!

Thanks in advance
Jonathan

Replies are listed 'Best First'.
Re: speeding up script that uses filehandles
by Yendor (Pilgrim) on Dec 02, 2004 at 17:06 UTC

    Try changing your foreach loop to a while loop. The effect of that will be to read the file line-by-line instead of attempting to load the whole thing into memory before processing it.

Re: speeding up script that uses filehandles
by gellyfish (Monsignor) on Dec 02, 2004 at 17:06 UTC

    Well yes for a start you might strongly consider changing

    foreach my $line (<DATASOURCE>){
    to
    while(my $line = <DATASOURCE>) {
    as the former has to build a list (thus requiring a large memory allocation) before it can start iterating. The latter will simply read and process the lines of the file one by one.

    /J\

Re: speeding up script that uses filehandles
by duff (Parson) on Dec 02, 2004 at 17:12 UTC

    Use a while loop rather than a foreach loop. foreach will read the entire file before anything happens because it provides a list context to the diamond operator (<DATASOURCE>). And you can let perl keep track of the line number for you because it's in the special variable $.:

    #!/usr/bin/perl use strict; use warnings; my $datasource ="C:\\somelargetextfile.txt"; open DATASOURCE, "$datasource" or die "Can't open $datasource: $!"; while (my $line = <DATASOURCE>){ print "$. - $line"; } close (DATASOURCE);

    If this is all that your code is doing then I assume you're running on Windows because if you were on a unix, you could just use the cat(1) command to do something similar:

    cat -n my_large_text_file.txt

    Update: Ah, I see from your other post that you're inserting info into a database. Other than not reading the entire file into memory there isn't much else that you're going to be able to do to speed up the loop.

    Why do you care how fast this program runs? Will it be run often? Must it finish within some time constraint so that it won't hold up a larger process?

Re: speeding up script that uses filehandles
by hsinclai (Deacon) on Dec 02, 2004 at 17:30 UTC
    Here's a slight variation of the outputting format, but definitely while and $. are speedier.

    use strict; open my $hndl, '<'."bigfile.txt" or die "Open of bigfile failed $!"; while (<$hndl>) { print sprintf("% 7s",$.) . " $_ \n"; } close $hndl;


      Off-topic from the original question, but whenever you do print sprintf, you should think about just using printf directly. I think this is clearer:

      while (<$hndl>) { printf "%7d %s\n", $., $_; }
slurp first?
by perrin (Chancellor) on Dec 02, 2004 at 18:33 UTC
    This question makes me thing the following: 70MB is pretty small. There should be plenty of RAM for that on a decent system. Using File::Slurp and then splitting may be faster. However, I'm not sure. It depends on what perl does in that (<THINGY>) loop. Has anyone done a benchmark like this?