Re: How to optimize a regex on a large file read line by line ?

Hello John FENDER, and welcome to the Monastery!

Since you don’t print a result until the loop has finished, it appears that you expect the regex to match only once. In that case, you can cut the time substantially¹ by exiting the loop as soon as a match is found:

while (FH)
{
    ++$counter;

    if (/1234556$)
    {
        ++$counter2;
        last;
    }
}
[download]

See perlsyn#Loop-Control.

¹By half, on the average, if the matching line appears in a random location within the file.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Re: How to optimize a regex on a large file read line by line ? Download Code

Replies are listed 'Best First'.

Re^2: How to optimize a regex on a large file read line by line ?
by John FENDER (Acolyte) on Apr 16, 2016 at 14:15 UTC

Hello Athanasius ! Thanks for your answer : i don't want to leave my loop until i know how many users with the password 123456$ i have in the file. Cheers.

[reply]

Re^3: How to optimize a regex on a large file read line by line ?

by Athanasius (Archbishop) on Apr 16, 2016 at 14:26 UTC

Ah yes, I see. In that case, you’re going to have to read through the whole file, and I doubt there’s much you can do to speed up the loop.

BTW, when I saw the regex /123456$/, I assumed you wanted to match 123456 at the end of a line — that’s what the $ anchor means in a regex. If you want to match a literal $, you need to escape it: m{123456\$} or:

use strict;
use warnings;
use autodie;

...

my $password = '123456';

open(FH, '<', "../Tests/10-million-combos.txt");
$counter  = 0;
$counter2 = 0;

while (<FH>)
{
    ++$counter;
    ++$counter2 if /^Q$password/;
}

print "Num. Line : $counter - Occ : $counter2\n";
close FH;
[download]

See quotemeta.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]