Re^5: How to optimize a regex on a large file read line by line ? (timing)

some rough timing to demonstrate how reading line by line will damage the performance

###### Timing grep (not fair bc exec takes time too)

  DB<163> $start=time; print `grep 123456\$ txt`; print time-$start
123456
... # shortend
123456

0.207661151885986

###### Reading and parsing a chunk from Perl not much slower

  DB<164> $start=time; open FH,"<txt"; read FH,$txt,100e6; print $txt 
+=~ /(123456\n)/g; print time-$start
123456
...
123456

0.257488012313843

###### Even reading a chunk takes already half the time

  DB<165> $start=time; print $txt =~ /(123456\n)/g; print time-$start
123456
...
123456

0.116161108016968

  DB<166> $start=time; open FH,"<txt"; read FH,$txt,100e6;  print time
+-$start

0.124891042709351

####### Size of txt is 70 MB

  DB<167> length $txt
 => 70000080

###### READING LINE BY LINE IS A BOTTLENECK

  DB<168> $start=time; open FH,"<txt"; while ($txt=<FH>){ print $1 if 
+$txt =~ /(123456\n)/g;} print time-$start
123456
...
123456

16.3332719802856
[download]

all done on a netbook with ubuntu.

questions left?

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

Comment on Re^5: How to optimize a regex on a large file read line by line ? (timing) Download Code