in reply to How to optimize a regex on a large file read line by line ?

Hello John FENDER, and welcome to the Monastery!

Since you don’t print a result until the loop has finished, it appears that you expect the regex to match only once. In that case, you can cut the time substantially1 by exiting the loop as soon as a match is found:

while (FH) { ++$counter; if (/1234556$) { ++$counter2; last; } }

See perlsyn#Loop-Control.

1By half, on the average, if the matching line appears in a random location within the file.

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: How to optimize a regex on a large file read line by line ?
by John FENDER (Acolyte) on Apr 16, 2016 at 14:15 UTC
    Hello Athanasius ! Thanks for your answer : i don't want to leave my loop until i know how many users with the password 123456$ i have in the file. Cheers.

      Ah yes, I see. In that case, you’re going to have to read through the whole file, and I doubt there’s much you can do to speed up the loop.

      BTW, when I saw the regex /123456$/, I assumed you wanted to match 123456 at the end of a line — that’s what the $ anchor means in a regex. If you want to match a literal $, you need to escape it: m{123456\$} or:

      use strict; use warnings; use autodie; ... my $password = '123456'; open(FH, '<', "../Tests/10-million-combos.txt"); $counter = 0; $counter2 = 0; while (<FH>) { ++$counter; ++$counter2 if /^Q$password/; } print "Num. Line : $counter - Occ : $counter2\n"; close FH;

      See quotemeta.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,