comment on

It's a good query.

Currently i can't anser you as i need to take a look into the grep source for that. Currently i'm just greping the file like that :

grep "12345$" myfile

Same for the count :

wc -l myfile

I have another code in Perl for doing what you are suggesting maybe : it first load all the lines in memory, then grep it. But unfortunatly, the result is worst compared to a line by line try (2,47s vs 8,33s). Here is the code used for this test (on a reduced set, 200 mb)

open (FH, '<', "../Tests/10-million-combos.txt");
print "Loading the file...\n";
while (<FH>) {
    push (@_file_to_parse, $_);
}
print "Counting the file...\n";
$NumberOfLine=@_file_to_parse;
print "Searching 123456$...\n";
@_result=grep {/123456$/} @_file_to_parse;
$NumberOfResult=@_result;
print "$NumberOfResult - $NumberOfLine\n";
close FH;
[download]

In reply to Re^4: How to optimize a regex on a large file read line by line ? by John FENDER
in thread How to optimize a regex on a large file read line by line ? by John FENDER

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.