If I run the code below on a file with size=35048455 and num_lines=769114 I get the following output:

0.160829 read lines from disk and do RE.
48.796606 read lines from in-memory file and do RE.

So the regular expressions (RE)s are taking about 300 times longer on the lines from the in-memory file. I'm assuming this is due to the "line" from the in-memory file actually being some reference into the scalar and that is somehow causing the RE to do a bunch of extra work, but I'd be interested in an actual explanation. Note, that when I just loop through the lines without doing the RE the disk and in-memory files take about the same amount of time, so it seems it is the RE that is causing the problem.

#!/usr/bin/env perl use warnings; use strict; use Time::HiRes qw( time ); my $file = shift @ARGV; my ($fh, $time); open $fh, "<", $file; $time = time; while(<$fh>) { /^ ?Query/; } printf "%f read lines from disk and do RE.\n", time - $time; seek $fh, 0, 0; my $s = ""; while(<$fh>) { $s .= $_; } open $fh, "<", \$s; $time = time; while(<$fh>) { /^ ?Query/; } printf "%f read lines from in-memory file and do RE.\n", time - $time;
EDIT: If you want to run my original file for consistency you can grab it at in_memory_re_issue.dat

In reply to RE on lines read from in-memory scalar is very slow by Danny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.