Re^4: How to optimize a regex on a large file read line by line ?

Please give me your execution times with the same code

Using my own 200 million record 2Gb file, it takes 25 secs to get a count of lines only and 50 seconds with the regex included. (win 10 i5 3.3GHz/8GB AS v5.16.1)

#!perl
use strict;
my $testfile = '200-million-combos.txt';
unless (-e $testfile){
  open OUT,'>',$testfile or die "$!";
  my $record = '890123456';
  for (1..200_000_000){
    print OUT $record."\n";
  }
  close OUT;
}
my $counter1 = 0;
my $counter2 = 0;
my $t0 = time;
open FH, '<', $testfile or die "$!";
while (<FH>) {
  ++$counter1;
  if (/123456$/){
    ++$counter2;
  }
}
close FH;
my $dur =  time-$t0;;
print "$counter1 read in $dur secs\n";
[download]

poj

Comment on Re^4: How to optimize a regex on a large file read line by line ? Download Code

Replies are listed 'Best First'.
Re^5: How to optimize a regex on a large file read line by line ? by John FENDER (Acolyte) on Apr 16, 2016 at 18:01 UTC
Sound good to my hear. Which distribution/version are you using ?	[reply]
Re^6: How to optimize a regex on a large file read line by line ? by poj (Abbot) on Apr 16, 2016 at 18:14 UTC
`This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail) Binary build 1601 [296175] provided by ActiveState http://www.ActiveSt +ate.com Built Aug 30 2012 18:41:50` [download]	[reply] [d/l]
Re^7: How to optimize a regex on a large file read line by line ? by John FENDER (Acolyte) on Apr 16, 2016 at 18:24 UTC
12,6 min on my side with a newer perl, same distro like yours : `:perl -v This is perl 5, version 22, subversion 1 (v5.22.1) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2015, Larry Wall Binary build 2201 [299574] provided by ActiveState http://www.ActiveSt +ate.com Built Jan 4 2016 12:12:58` [download] Could you give me your time with this code and the same file (http://mab.to/tbT8VsPDm) perl demo.pl `open (FH, '<', "../Tests/10-million-combos.txt"); $counter=0; $counter2=0; while (<FH>) { if (/123456$/) {++$counter2;} } print "\n"; print "Num. Line : $. - Occ : $counter2\n"; close FH;` [download] Thanks.	[reply] [d/l] [select]
Re^8: How to optimize a regex on a large file read line by line ? by poj (Abbot) on Apr 16, 2016 at 20:28 UTC
Re^9: How to optimize a regex on a large file read line by line ? by John FENDER (Acolyte) on Apr 16, 2016 at 22:42 UTC