Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I had the idea that re-reading DATA after it's been exhausted would be faster than dumping it into an array for re-use. I'll post my test after the READMORE tag, in case I've made some blunder that stands out, or for your convenience

But, my question is more general: why did re-reading DATA prove to be so incredibly much slower than re-iterating over the array? I thought that the "cursor" was a procedure that points to a memory address - am I thinking of it incorrectly. Is the performance penalty restricted to this special handle, all handles, or is it associated with tell() and seek()?

Thanks in advance for the wonderful help this place consistently provides, in learning to think better in Perl
mkmcconn
updated tests

#!/usr/bin/perl -w use strict; use Benchmark; # uncomment to print sample output for each function # my $ap = 1; #for my $ret ( # rewhile_data(), # refor_data(), # reinfor_data(), # read_array() # ){ # for (my $id = 1;$id < 8; $id++){ # show_out($ret->{"$ap.$id"}); # } # $ap++; #} # benchmark tests timethese(10000,{ 'REWHILE' => \&rewhile_data, 'REFOR' => \&refor_data, 'READAR' => \&read_array, 'REINFOR' => \&reinfor_data, }); # functions sub rewhile_data { my $cursor = tell DATA; my %ahash; for my $i (1..100){ while (my $j = <DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $ahash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%ahash; } sub refor_data { my $cursor = tell DATA ; my %bhash; for my $i (1..100){ for my $j (<DATA>){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $bhash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0); } return \%bhash; } sub reinfor_data { my $cursor ; $cursor = tell DATA ; my %chash; for my $i (1..100){ for ( ;my $j = <DATA>; ){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $chash{"$i.$num"} = [ "$i.$num",$fn,$ln]; } seek (DATA, $cursor, 0) } return \%chash; } sub read_array { my @data_array = <DATA>; my %dhash; for my $i (1..100){ foreach my $j (@data_array){ next if $j =~ m/^\s*$/; my ($num,$fn,$ln) = $j =~ m/(\w+)/g; $dhash{"$i.$num"} = ["$i.$num",$fn,$ln]; } } return \%dhash; } sub show_out { my $ref_ = shift; print "$ref_->[0]:\t$ref_->[1]\t$ref_->[2]\n"; } __DATA__ 1 First _____________ 2 Last _____________ 3 Street _____________ 4 Apt _____________ 5 City _____________ 6 State _____________ 7 _______________________________________ _

Results:

Benchmark: timing 10000 iterations of READAR, REFOR, REINFOR, REWHILE. +.. READAR: 1 wallclock secs ( 2.80 usr + 0.00 sys = 2.80 CPU) @ 35 +71.43/s (n=10000) REFOR: 43 wallclock secs (42.95 usr + 0.00 sys = 42.95 CPU) @ 23 +2.83/s (n=10000) REINFOR: 35 wallclock secs (34.33 usr + 0.00 sys = 34.33 CPU) @ 29 +1.29/s (n=10000) REWHILE: 35 wallclock secs (34.77 usr + 0.00 sys = 34.77 CPU) @ 28 +7.60/s (n=10000)
mkmcconn


In reply to Why re-reading DATA is slow by mkmcconn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2024-04-16 10:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found