in reply to Re: Finding Line numbers in a file
in thread Finding Line numbers in a file

Nice, but inefficient, and gets worse the bigger the text file is.

Do not do this, use the others, they increase in a linear proportion with the size of the text file, and do not require entire file to be loaded into memory.

---
my name's not Keith, and I'm not reasonable.

Replies are listed 'Best First'.
Re^3: Finding Line numbers in a file
by Rhandom (Curate) on Apr 04, 2007 at 16:09 UTC
    You are possibly right. You are just as possibly wrong. There are several things that we don't know, such as:
    • Average line length. Shorter lines means more lowlevel iterations.
    • Average file length. Longer files will require more memory - but that is about all.
    • Average hit count. How often is the string found in the file.
    • Average hit placement. How often does the string end up at the beginning or the end.
    • Implementation issues. Is the string passed in already in one chunk or do we have access to a file handle.
    There are just too many unknowns to use blanket statements as to which algorithm is best.

    But one thing that is a major issue is that the special regex capture variables shouldn't be used. They impose too much penalty. Instead though you can use @- and @+ which have no penalty. As in the following:

    my $str = "1 one 2 two 3 one 4 four 5 one 6 five"; my $last_pos = 0; my $newlines = 1; while ($str =~ /(one)/g) { $newlines += substr($str, $last_pos, $-[0] - $last_pos) =~ tr/\n// +; $last_pos = $-[0]; print "Found on line $newlines\n"; } # prints # Found on line 1 # Found on line 3 # Found on line 5


    Notice the optimization that only counts newlines from the previous match.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re^3: Finding Line numbers in a file
by sanPerl (Friar) on Apr 04, 2007 at 15:51 UTC
    Dear kyle and reasonablekeith,
    Thanks for suggestion and warning also. This is making me think in new directions.