Why re-reading DATA is slow

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

I had the idea that re-reading DATA after it's been exhausted would be faster than dumping it into an array for re-use. I'll post my test after the READMORE tag, in case I've made some blunder that stands out, or for your convenience

But, my question is more general: why did re-reading DATA prove to be so incredibly much slower than re-iterating over the array? I thought that the "cursor" was a procedure that points to a memory address - am I thinking of it incorrectly. Is the performance penalty restricted to this special handle, all handles, or is it associated with tell() and seek()?

Thanks in advance for the wonderful help this place consistently provides, in learning to think better in Perl
mkmcconn
updated tests

#!/usr/bin/perl -w

use strict;
use Benchmark;

#   uncomment to print sample output for each function

# my $ap      =   1;
#for my $ret (
#            rewhile_data(),
#            refor_data(),
#            reinfor_data(),
#            read_array()
#            ){
#    for (my $id = 1;$id < 8; $id++){
#        show_out($ret->{"$ap.$id"});
#    }
#    $ap++;
#}

#   benchmark tests

timethese(10000,{
    'REWHILE'   =>  \&rewhile_data,
    'REFOR'     =>  \&refor_data,
    'READAR'    =>  \&read_array,
    'REINFOR'   =>  \&reinfor_data,
    });


#   functions

sub rewhile_data
{
    my $cursor        =   tell DATA;
    my %ahash;

    for my $i (1..100){
        while (my $j = <DATA>){
            next if $j =~ m/^\s*$/;
            my ($num,$fn,$ln)   =   $j  =~ m/(\w+)/g;
            $ahash{"$i.$num"}   =   [ "$i.$num",$fn,$ln];
        }
        seek (DATA, $cursor, 0);
    }
    return \%ahash;
} 

sub refor_data
{
    my $cursor  = tell DATA ;
    my %bhash;
    for my $i (1..100){
        for my $j (<DATA>){
            next if $j =~ m/^\s*$/;
            my ($num,$fn,$ln)   =   $j  =~ m/(\w+)/g;
            $bhash{"$i.$num"}   =   [ "$i.$num",$fn,$ln];
        }
        seek (DATA, $cursor, 0);
    }
    return \%bhash;
}

sub reinfor_data
{
    my $cursor ; 
    $cursor = tell DATA ;
    my %chash;
    for my $i (1..100){
        for ( ;my $j = <DATA>; ){
            next if $j =~ m/^\s*$/;
            my ($num,$fn,$ln)   =   $j  =~ m/(\w+)/g;
            $chash{"$i.$num"}   =   [ "$i.$num",$fn,$ln];
        }
        seek (DATA, $cursor, 0)
        
    }
    return \%chash;
} 


sub read_array
{
    my @data_array  =   <DATA>;
    my %dhash;
    for my $i (1..100){
        foreach my $j (@data_array){
            next if $j =~ m/^\s*$/;
            my ($num,$fn,$ln)   =  $j  =~ m/(\w+)/g;
            $dhash{"$i.$num"}   =  ["$i.$num",$fn,$ln];
        }
    }
    return \%dhash;
}

sub show_out
{
    my $ref_    = shift;
    print "$ref_->[0]:\t$ref_->[1]\t$ref_->[2]\n";
}

__DATA__
1   First    _____________
2   Last     _____________
3   Street   _____________
4   Apt    _____________
5   City  _____________
6   State  _____________
7   _______________________________________   _
[download]

Results:

Benchmark: timing 10000 iterations of READAR, REFOR, REINFOR, REWHILE.
+..
    READAR:  1 wallclock secs ( 2.80 usr +  0.00 sys =  2.80 CPU) @ 35
+71.43/s (n=10000)
     REFOR: 43 wallclock secs (42.95 usr +  0.00 sys = 42.95 CPU) @ 23
+2.83/s (n=10000)
   REINFOR: 35 wallclock secs (34.33 usr +  0.00 sys = 34.33 CPU) @ 29
+1.29/s (n=10000)
   REWHILE: 35 wallclock secs (34.77 usr +  0.00 sys = 34.77 CPU) @ 28
+7.60/s (n=10000)
[download]

mkmcconn

Back to Seekers of Perl Wisdom