"The predefined global variable $. does that for you"

Wasn't aware of this trick, thanks !

"Spoiler alert: your file "10-million-combos.txt" does not contain any lines that match /123456$/."

Hahem, sound like i've done something wrong while zipping the file. Now the 19x mb file containing 10 millions password are updated in the right way. You will find 10000000 lines in it, and 61466 with the regex 123456$.

"unzip -p 10-million-combos.txt.zip | perlscript"

Currently i'm working on txt file only. But it's interesting. I've done your test like that :

echo 1:%time% unzip -p 10-million-combos.zip | grep 123456$ | wc -l echo 2:%time% grep 123456$ 10-million-combos.txt | wc -l echo 3:%time% pause

Result :

1:19:16:46,11 61466 2:19:16:48,43 61466 3:19:16:49,00

0,58 in plaintext, 2,27 in zip file piped.

More now with your command line

zip piped : 3,89 unzip -p "C:\Users\admin\Desktop\10-million-combos.zip" | perl -ne "BE +GIN{$n=0} $n++ if /123456$/; END{print $n}" plain text : 5,16 type "C:\Users\admin\Desktop\10-million-combos.txt" | perl -ne "BEGIN{ +$n=0} $n++ if /123456$/; END{print $n}") perl direct : 2,29 perl "demo.pl"

=Fastest on my side stay the direct access to the plain text file either using grep or perl. Amazing to see the perl unzip goes faster than the plain text access with an inline command... The shell is strange sometimes...

"I was going to suggest using the gnu/*n*x "grep" command-line utility to get a performance baseline"

Im' using the one you can find in the unix utils, i suppose it's the GNU one ported on windows. --version give me : grep (GNU grep) 2.4.2.

Now grep vs perl
echo %time%& grep 123456$ C:\Users\admin\Desktop\10-million-combos.txt + | wc -l& echo %time% echo %time%& type "C:\Users\admin\Desktop\10-million-combos.txt" | per +l -ne "BEGIN{$n=0} $n++ if /123456$/; END{print $n}"& echo.&echo %tim +e% echo %time%& perl demo.pl& echo %time%

Give me :

19:43:28,91/61466/19:43:29,51 for grep (0,6) 19:45:29,51/61466/19:45:34,71 for perl (5,2) 19:46:13,27/61466/19:46:15,47 for perl (direct) (2,2)

In reply to Re^2: How to optimize a regex on a large file read line by line ? by John FENDER
in thread How to optimize a regex on a large file read line by line ? by John FENDER

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.