I had the idea that re-reading DATA after it's been exhausted would be faster than dumping it into an array for re-use. I'll post my test after the READMORE tag, in case I've made some blunder that stands out, or for your convenience

But, my question is more general: why did re-reading DATA prove to be so incredibly much slower than re-iterating over the array? I thought that the "cursor" was a procedure that points to a memory address - am I thinking of it incorrectly. Is the performance penalty restricted to this special handle, all handles, or is it associated with tell() and seek()?

Thanks in advance for the wonderful help this place consistently provides, in learning to think better in Perl
mkmcconn
updated tests

#!/usr/bin/perl -w use strict; use Benchmark; # uncomment to print sample output for each function # my $ap = 1; #for my $ret ( # rewhile_data(), # refor_data(), # reinfor_data(), # read_array() # ){ # for (my $id = 1;$id < 8; $id++){ # show_out($ret->{"$ap.$id"}); # } # $ap++; #} # benchmark tests timethese(10000,{ 'REWHILE' => \&rewhile_data, 'REFOR' => \&refor_data, 'READAR' => \&read_array, 'REINFOR' => \&reinfor_data, }); # functions sub rewhile_data { my $cursor = tell DATA; my %ahash; for my $i (1..100){ while (my $j = <DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $ahash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%ahash; } sub refor_data { my $cursor = tell DATA ; my %bhash; for my $i (1..100){ for my $j (<DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $bhash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%bhash; } sub reinfor_data { my $cursor ; $cursor = tell DATA ; my %chash; for my $i (1..100){ for ( ;my $j = <DATA>; ){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $chash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0) } return \%chash; } sub read_array { my @data_array = <DATA>; my %dhash; for my $i (1..100){ foreach my $j (@data_array){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $dhash{"$i.$num"} = ["$i.$num",$fn,$ln]; } } return \%dhash; } sub show_out { my $ref_ = shift; print "$ref_->[0]:\t$ref_->[1]\t$ref_->[2]\n"; } __DATA__ 1 First _____________ 2 Last _____________ 3 Street _____________ 4 Apt _____________ 5 City _____________ 6 State _____________ 7 _______________________________________ _

Results:

Benchmark: timing 10000 iterations of READAR, REFOR, REINFOR, REWHILE. +.. READAR: 1 wallclock secs ( 2.80 usr + 0.00 sys = 2.80 CPU) @ 35 +71.43/s (n=10000) REFOR: 43 wallclock secs (42.95 usr + 0.00 sys = 42.95 CPU) @ 23 +2.83/s (n=10000) REINFOR: 35 wallclock secs (34.33 usr + 0.00 sys = 34.33 CPU) @ 29 +1.29/s (n=10000) REWHILE: 35 wallclock secs (34.77 usr + 0.00 sys = 34.77 CPU) @ 28 +7.60/s (n=10000)
mkmcconn


In reply to Why re-reading DATA is slow by mkmcconn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.