Re: Finding Line numbers in a file

According to my Camel, "each time a pattern successfully matches (including the pattern in a substitution), it sets the $`, $&, and $' variables to the text left of the match, the whole match, and the text right of the match."

That sounds useful.

my $text = <<'END_OF_TEXT';
line 1 apple
banana line 2
line cherry 3
END_OF_TEXT
    ;
while ( $text =~ m/(apple|banana|cherry)/ig ) {
    my $word = $1;
    my $prelines = ( $` =~ tr/\n// );
    printf qq{Word "%s" found on line %d\n}, $word, $prelines + 1;
}
__END__
Word "apple" found on line 1
Word "banana" found on line 2
Word "cherry" found on line 3
[download]

If you use English, the $` variable is called $PREMATCH (see perlvar, which notes that using this variable "imposes a considerable performance penalty on all regular expression matches").

Comment on Re: Finding Line numbers in a file Select or Download Code

Replies are listed 'Best First'.
Re^2: Finding Line numbers in a file by reasonablekeith (Deacon) on Apr 04, 2007 at 15:29 UTC
Nice, but inefficient, and gets worse the bigger the text file is. Do not do this, use the others, they increase in a linear proportion with the size of the text file, and do not require entire file to be loaded into memory. --- my name's not Keith, and I'm not reasonable.	[reply]
Re^3: Finding Line numbers in a file by Rhandom (Curate) on Apr 04, 2007 at 16:09 UTC
You are possibly right. You are just as possibly wrong. There are several things that we don't know, such as: Average line length. Shorter lines means more lowlevel iterations. Average file length. Longer files will require more memory - but that is about all. Average hit count. How often is the string found in the file. Average hit placement. How often does the string end up at the beginning or the end. Implementation issues. Is the string passed in already in one chunk or do we have access to a file handle. There are just too many unknowns to use blanket statements as to which algorithm is best. But one thing that is a major issue is that the special regex capture variables shouldn't be used. They impose too much penalty. Instead though you can use `@-` and `@+` which have no penalty. As in the following: `my $str = "1 one 2 two 3 one 4 four 5 one 6 five"; my $last_pos = 0; my $newlines = 1; while ($str =~ /(one)/g) { $newlines += substr($str, $last_pos, $-[0] - $last_pos) =~ tr/\n// +; $last_pos = $-[0]; print "Found on line $newlines\n"; } # prints # Found on line 1 # Found on line 3 # Found on line 5` [download] Notice the optimization that only counts newlines from the previous match. my @a=qw(random brilliant braindead); print $a[rand(@a)];	[reply] [d/l] [select]
Re^3: Finding Line numbers in a file by sanPerl (Friar) on Apr 04, 2007 at 15:51 UTC
Dear kyle and reasonablekeith, Thanks for suggestion and warning also. This is making me think in new directions.	[reply]


Just another Perl shrine
	PerlMonks