If I run the code below on a file with size=35048455 and num_lines=769114 I get the following output:
0.160829 read lines from disk and do RE.
48.796606 read lines from in-memory file and do RE.
So the regular expressions (RE)s are taking about 300 times longer on the lines from the in-memory file. I'm assuming this is due to the "line" from the in-memory file actually being some reference into the scalar and that is somehow causing the RE to do a bunch of extra work, but I'd be interested in an actual explanation. Note, that when I just loop through the lines without doing the RE the disk and in-memory files take about the same amount of time, so it seems it is the RE that is causing the problem.
EDIT: If you want to run my original file for consistency you can grab it at in_memory_re_issue.dat#!/usr/bin/env perl use warnings; use strict; use Time::HiRes qw( time ); my $file = shift @ARGV; my ($fh, $time); open $fh, "<", $file; $time = time; while(<$fh>) { /^ ?Query/; } printf "%f read lines from disk and do RE.\n", time - $time; seek $fh, 0, 0; my $s = ""; while(<$fh>) { $s .= $_; } open $fh, "<", \$s; $time = time; while(<$fh>) { /^ ?Query/; } printf "%f read lines from in-memory file and do RE.\n", time - $time;
In reply to RE on lines read from in-memory scalar is very slow by Danny
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |