Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: Finding Line numbers in a file

by reasonablekeith (Deacon)
on Apr 04, 2007 at 15:29 UTC ( [id://608299]=note: print w/replies, xml ) Need Help??


in reply to Re: Finding Line numbers in a file
in thread Finding Line numbers in a file

Nice, but inefficient, and gets worse the bigger the text file is.

Do not do this, use the others, they increase in a linear proportion with the size of the text file, and do not require entire file to be loaded into memory.

---
my name's not Keith, and I'm not reasonable.

Replies are listed 'Best First'.
Re^3: Finding Line numbers in a file
by Rhandom (Curate) on Apr 04, 2007 at 16:09 UTC
    You are possibly right. You are just as possibly wrong. There are several things that we don't know, such as:
    • Average line length. Shorter lines means more lowlevel iterations.
    • Average file length. Longer files will require more memory - but that is about all.
    • Average hit count. How often is the string found in the file.
    • Average hit placement. How often does the string end up at the beginning or the end.
    • Implementation issues. Is the string passed in already in one chunk or do we have access to a file handle.
    There are just too many unknowns to use blanket statements as to which algorithm is best.

    But one thing that is a major issue is that the special regex capture variables shouldn't be used. They impose too much penalty. Instead though you can use @- and @+ which have no penalty. As in the following:

    my $str = "1 one 2 two 3 one 4 four 5 one 6 five"; my $last_pos = 0; my $newlines = 1; while ($str =~ /(one)/g) { $newlines += substr($str, $last_pos, $-[0] - $last_pos) =~ tr/\n// +; $last_pos = $-[0]; print "Found on line $newlines\n"; } # prints # Found on line 1 # Found on line 3 # Found on line 5


    Notice the optimization that only counts newlines from the previous match.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re^3: Finding Line numbers in a file
by sanPerl (Friar) on Apr 04, 2007 at 15:51 UTC
    Dear kyle and reasonablekeith,
    Thanks for suggestion and warning also. This is making me think in new directions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://608299]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-19 09:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found