in reply to Re: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?
"The predefined global variable $. does that for you"
Wasn't aware of this trick, thanks !
"Spoiler alert: your file "10-million-combos.txt" does not contain any lines that match /123456$/."
Hahem, sound like i've done something wrong while zipping the file. Now the 19x mb file containing 10 millions password are updated in the right way. You will find 10000000 lines in it, and 61466 with the regex 123456$.
"unzip -p 10-million-combos.txt.zip | perlscript"Currently i'm working on txt file only. But it's interesting. I've done your test like that :
echo 1:%time% unzip -p 10-million-combos.zip | grep 123456$ | wc -l echo 2:%time% grep 123456$ 10-million-combos.txt | wc -l echo 3:%time% pause
Result :
1:19:16:46,11 61466 2:19:16:48,43 61466 3:19:16:49,00
0,58 in plaintext, 2,27 in zip file piped.
More now with your command line
zip piped : 3,89 unzip -p "C:\Users\admin\Desktop\10-million-combos.zip" | perl -ne "BE +GIN{$n=0} $n++ if /123456$/; END{print $n}" plain text : 5,16 type "C:\Users\admin\Desktop\10-million-combos.txt" | perl -ne "BEGIN{ +$n=0} $n++ if /123456$/; END{print $n}") perl direct : 2,29 perl "demo.pl"
=Fastest on my side stay the direct access to the plain text file either using grep or perl. Amazing to see the perl unzip goes faster than the plain text access with an inline command... The shell is strange sometimes...
"I was going to suggest using the gnu/*n*x "grep" command-line utility to get a performance baseline"
Im' using the one you can find in the unix utils, i suppose it's the GNU one ported on windows. --version give me : grep (GNU grep) 2.4.2.
Now grep vs perlecho %time%& grep 123456$ C:\Users\admin\Desktop\10-million-combos.txt + | wc -l& echo %time% echo %time%& type "C:\Users\admin\Desktop\10-million-combos.txt" | per +l -ne "BEGIN{$n=0} $n++ if /123456$/; END{print $n}"& echo.&echo %tim +e% echo %time%& perl demo.pl& echo %time%
Give me :
19:43:28,91/61466/19:43:29,51 for grep (0,6) 19:45:29,51/61466/19:45:34,71 for perl (5,2) 19:46:13,27/61466/19:46:15,47 for perl (direct) (2,2)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: How to optimize a regex on a large file read line by line ?
by graff (Chancellor) on Apr 18, 2016 at 09:02 UTC |