in reply to Re^4: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?

some rough timing to demonstrate how reading line by line will damage the performance

###### Timing grep (not fair bc exec takes time too) DB<163> $start=time; print `grep 123456\$ txt`; print time-$start 123456 ... # shortend 123456 0.207661151885986 ###### Reading and parsing a chunk from Perl not much slower DB<164> $start=time; open FH,"<txt"; read FH,$txt,100e6; print $txt +=~ /(123456\n)/g; print time-$start 123456 ... 123456 0.257488012313843 ###### Even reading a chunk takes already half the time DB<165> $start=time; print $txt =~ /(123456\n)/g; print time-$start 123456 ... 123456 0.116161108016968 DB<166> $start=time; open FH,"<txt"; read FH,$txt,100e6; print time +-$start 0.124891042709351 ####### Size of txt is 70 MB DB<167> length $txt => 70000080 ###### READING LINE BY LINE IS A BOTTLENECK DB<168> $start=time; open FH,"<txt"; while ($txt=<FH>){ print $1 if +$txt =~ /(123456\n)/g;} print time-$start 123456 ... 123456 16.3332719802856

all done on a netbook with ubuntu.

questions left?

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

  • Comment on Re^5: How to optimize a regex on a large file read line by line ? (timing)
  • Download Code