in reply to Re^4: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?

Sound good to my hear. Which distribution/version are you using ?
  • Comment on Re^5: How to optimize a regex on a large file read line by line ?

Replies are listed 'Best First'.
Re^6: How to optimize a regex on a large file read line by line ?
by poj (Abbot) on Apr 16, 2016 at 18:14 UTC
    This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail) Binary build 1601 [296175] provided by ActiveState http://www.ActiveSt +ate.com Built Aug 30 2012 18:41:50
      12,6 min on my side with a newer perl, same distro like yours :
      :perl -v This is perl 5, version 22, subversion 1 (v5.22.1) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2015, Larry Wall Binary build 2201 [299574] provided by ActiveState http://www.ActiveSt +ate.com Built Jan 4 2016 12:12:58
      Could you give me your time with this code and the same file (http://mab.to/tbT8VsPDm) perl demo.pl
      open (FH, '<', "../Tests/10-million-combos.txt"); $counter=0; $counter2=0; while (<FH>) { if (/123456$/) {++$counter2;} } print "\n"; print "Num. Line : $. - Occ : $counter2\n"; close FH;
      Thanks.

        It took 7 mins with your file. It seems to be related to the line ending not being normal for windows (they are LF only). After I 'processed' your file with this code it took less than 1 minute to scan.

        #!perl use strict; my $t0 = time; open FH, '<', "dict.txt" or die "$!"; open OUT,'>','dict1.txt' or die "$!"; while (<FH>) { print OUT $_; } close FH; print time-$t0;
        Original
        Num. Line : 185866729 - Occ : 14900
        421 secs
        
        Converted
        Num. Line : 185866729 - Occ : 14900
        33 sec