in reply to Re^2: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?

By the way, here is the full 2 Gb dict i'm using for tests :

http://mab.to/tbT8VsPDm

Please give me your execution times with the same code, your plateform, it's interesting.

  • Comment on Re^3: How to optimize a regex on a large file read line by line ?

Replies are listed 'Best First'.
Re^4: How to optimize a regex on a large file read line by line ?
by poj (Abbot) on Apr 16, 2016 at 16:28 UTC
    Please give me your execution times with the same code

    Using my own 200 million record 2Gb file, it takes 25 secs to get a count of lines only and 50 seconds with the regex included. (win 10 i5 3.3GHz/8GB AS v5.16.1)

    #!perl use strict; my $testfile = '200-million-combos.txt'; unless (-e $testfile){ open OUT,'>',$testfile or die "$!"; my $record = '890123456'; for (1..200_000_000){ print OUT $record."\n"; } close OUT; } my $counter1 = 0; my $counter2 = 0; my $t0 = time; open FH, '<', $testfile or die "$!"; while (<FH>) { ++$counter1; if (/123456$/){ ++$counter2; } } close FH; my $dur = time-$t0;; print "$counter1 read in $dur secs\n";
    poj
      Sound good to my hear. Which distribution/version are you using ?
        This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x +64-multi-thread (with 1 registered patch, see perl -V for more detail) Binary build 1601 [296175] provided by ActiveState http://www.ActiveSt +ate.com Built Aug 30 2012 18:41:50
Re^4: How to optimize a regex on a large file read line by line ?
by polettix (Vicar) on Apr 16, 2016 at 16:57 UTC
    $ time ./script.pl dict.txt Num. Line : 185866729 - Occ : 14900 real 0m39.453s user 0m38.999s sys 0m0.445s $ perl -v This is perl 5, version 16, subversion 2 (v5.16.2) built for darwin-th +read-multi-2level (with 3 registered patches, see perl -V for more detail)
    Mac OS X 10.9.5, Intel Core i7 2.4 GHz, 16 GB RAM 1600 MHz DDR3

    You can shove some time off getting rid of $counter and using $. instead, a quick test took about 6 seconds less in my configuration.

    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Io ho capito... ma tu che hai detto?
      So maybe an issue related to my Windows/Distro, i will try to search why. Thanks.
        You're welcome. I forgot to add that my hard drive is SSD, although this would NOT account for 11 minutes difference.

        Update added missing negation, thanks ww

        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Io ho capito... ma tu che hai detto?