perlquestion
mkmcconn
<p>I had the idea that re-reading DATA after it's been exhausted would be faster than dumping it into an array for re-use. I'll post my test after the READMORE tag, in case I've made some blunder that stands out, or for your convenience</p>
<p>But, my question is more general: why did re-reading DATA prove to be so incredibly much slower than re-iterating over the array? I thought that the "cursor" was a procedure that points to a memory address - am I thinking of it incorrectly. Is the performance penalty restricted to this special handle, all handles, or is it associated with [perlfunc:tell|tell()] and [perlfunc:seek|seek()]?</p>
<p>Thanks in advance for the wonderful help this place consistently provides, in learning to think better in Perl<br />
[mkmcconn]<br />
<b>updated</b> tests</p>
<READMORE>
<code>
#!/usr/bin/perl -w
use strict;
use Benchmark;
# uncomment to print sample output for each function
# my $ap = 1;
#for my $ret (
# rewhile_data(),
# refor_data(),
# reinfor_data(),
# read_array()
# ){
# for (my $id = 1;$id < 8; $id++){
# show_out($ret->{"$ap.$id"});
# }
# $ap++;
#}
# benchmark tests
timethese(10000,{
'REWHILE' => \&rewhile_data,
'REFOR' => \&refor_data,
'READAR' => \&read_array,
'REINFOR' => \&reinfor_data,
});
# functions
sub rewhile_data
{
my $cursor = tell DATA;
my %ahash;
for my $i (1..100){
while (my $j = <DATA>){
next if $j =~ m/^\s*$/;
my ($num,$fn,$ln) = $j =~ m/(\w+)/g;
$ahash{"$i.$num"} = [ "$i.$num",$fn,$ln];
}
seek (DATA, $cursor, 0);
}
return \%ahash;
}
sub refor_data
{
my $cursor = tell DATA ;
my %bhash;
for my $i (1..100){
for my $j (<DATA>){
next if $j =~ m/^\s*$/;
my ($num,$fn,$ln) = $j =~ m/(\w+)/g;
$bhash{"$i.$num"} = [ "$i.$num",$fn,$ln];
}
seek (DATA, $cursor, 0);
}
return \%bhash;
}
sub reinfor_data
{
my $cursor ;
$cursor = tell DATA ;
my %chash;
for my $i (1..100){
for ( ;my $j = <DATA>; ){
next if $j =~ m/^\s*$/;
my ($num,$fn,$ln) = $j =~ m/(\w+)/g;
$chash{"$i.$num"} = [ "$i.$num",$fn,$ln];
}
seek (DATA, $cursor, 0)
}
return \%chash;
}
sub read_array
{
my @data_array = <DATA>;
my %dhash;
for my $i (1..100){
foreach my $j (@data_array){
next if $j =~ m/^\s*$/;
my ($num,$fn,$ln) = $j =~ m/(\w+)/g;
$dhash{"$i.$num"} = ["$i.$num",$fn,$ln];
}
}
return \%dhash;
}
sub show_out
{
my $ref_ = shift;
print "$ref_->[0]:\t$ref_->[1]\t$ref_->[2]\n";
}
__DATA__
1 First _____________
2 Last _____________
3 Street _____________
4 Apt _____________
5 City _____________
6 State _____________
7 _______________________________________ _
</code>
<p>Results:<br />
<code>
Benchmark: timing 10000 iterations of READAR, REFOR, REINFOR, REWHILE...
READAR: 1 wallclock secs ( 2.80 usr + 0.00 sys = 2.80 CPU) @ 3571.43/s (n=10000)
REFOR: 43 wallclock secs (42.95 usr + 0.00 sys = 42.95 CPU) @ 232.83/s (n=10000)
REINFOR: 35 wallclock secs (34.33 usr + 0.00 sys = 34.33 CPU) @ 291.29/s (n=10000)
REWHILE: 35 wallclock secs (34.77 usr + 0.00 sys = 34.77 CPU) @ 287.60/s (n=10000)
</code>
[mkmcconn]</p>