http://qs1969.pair.com?node_id=123103

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

I had the idea that re-reading DATA after it's been exhausted would be faster than dumping it into an array for re-use. I'll post my test after the READMORE tag, in case I've made some blunder that stands out, or for your convenience

But, my question is more general: why did re-reading DATA prove to be so incredibly much slower than re-iterating over the array? I thought that the "cursor" was a procedure that points to a memory address - am I thinking of it incorrectly. Is the performance penalty restricted to this special handle, all handles, or is it associated with tell() and seek()?

Thanks in advance for the wonderful help this place consistently provides, in learning to think better in Perl
mkmcconn
updated tests

#!/usr/bin/perl -w use strict; use Benchmark; # uncomment to print sample output for each function # my $ap = 1; #for my $ret ( # rewhile_data(), # refor_data(), # reinfor_data(), # read_array() # ){ # for (my $id = 1;$id < 8; $id++){ # show_out($ret->{"$ap.$id"}); # } # $ap++; #} # benchmark tests timethese(10000,{ 'REWHILE' => \&rewhile_data, 'REFOR' => \&refor_data, 'READAR' => \&read_array, 'REINFOR' => \&reinfor_data, }); # functions sub rewhile_data { my $cursor = tell DATA; my %ahash; for my $i (1..100){ while (my $j = <DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $ahash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%ahash; } sub refor_data { my $cursor = tell DATA ; my %bhash; for my $i (1..100){ for my $j (<DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $bhash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%bhash; } sub reinfor_data { my $cursor ; $cursor = tell DATA ; my %chash; for my $i (1..100){ for ( ;my $j = <DATA>; ){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $chash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0) } return \%chash; } sub read_array { my @data_array = <DATA>; my %dhash; for my $i (1..100){ foreach my $j (@data_array){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $dhash{"$i.$num"} = ["$i.$num",$fn,$ln]; } } return \%dhash; } sub show_out { my $ref_ = shift; print "$ref_->[0]:\t$ref_->[1]\t$ref_->[2]\n"; } __DATA__ 1 First _____________ 2 Last _____________ 3 Street _____________ 4 Apt _____________ 5 City _____________ 6 State _____________ 7 _______________________________________ _

Results:

Benchmark: timing 10000 iterations of READAR, REFOR, REINFOR, REWHILE. +.. READAR: 1 wallclock secs ( 2.80 usr + 0.00 sys = 2.80 CPU) @ 35 +71.43/s (n=10000) REFOR: 43 wallclock secs (42.95 usr + 0.00 sys = 42.95 CPU) @ 23 +2.83/s (n=10000) REINFOR: 35 wallclock secs (34.33 usr + 0.00 sys = 34.33 CPU) @ 29 +1.29/s (n=10000) REWHILE: 35 wallclock secs (34.77 usr + 0.00 sys = 34.77 CPU) @ 28 +7.60/s (n=10000)
mkmcconn