Re^3: Finding Line numbers in a file

You are possibly right. You are just as possibly wrong. There are several things that we don't know, such as:

Average line length. Shorter lines means more lowlevel iterations.
Average file length. Longer files will require more memory - but that is about all.
Average hit count. How often is the string found in the file.
Average hit placement. How often does the string end up at the beginning or the end.
Implementation issues. Is the string passed in already in one chunk or do we have access to a file handle.

There are just too many unknowns to use blanket statements as to which algorithm is best.

But one thing that is a major issue is that the special regex capture variables shouldn't be used. They impose too much penalty. Instead though you can use @- and @+ which have no penalty. As in the following:

my $str = "1 one
2 two
3 one
4 four
5 one
6 five";
my $last_pos = 0;
my $newlines = 1;
while ($str =~ /(one)/g) {
    $newlines += substr($str, $last_pos, $-[0] - $last_pos) =~ tr/\n//
+;
    $last_pos = $-[0];
    print "Found on line $newlines\n";
}
# prints 
# Found on line 1
# Found on line 3
# Found on line 5
[download]

Notice the optimization that only counts newlines from the previous match.

my @a=qw(random brilliant braindead); print $a[rand(@a)];

Comment on Re^3: Finding Line numbers in a file Select or Download Code


Don't ask to ask, just ask
	PerlMonks