Highlighting Regex Hits

rlrandallx has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

This is a regex question so forgive me if I should be asking it somewhere else.

Here's my problem: I am searching thru a string/document for a word. If I find it, I wish to print the line no., line and highlight the word (square brackets is ok but I would really like ANSI color.) The issue is when there is more than one hit on the same line. I've read all about global matching but nothing works. Below is my code:

:
$line = 'This line has a hit here and a hit there.";
$word = 'hit';
$count = 0;
while ($line =~ /\b$word\b/gi )#I also tried "/gic" & pos()
   {
   $line = "$`".'['."$&".']'."$'";
   $count++
   }
print "$lino $line\n";
:
print "$word was found $count times\n";
[download]

With color, the one line is "$line = "$`".BLACK \ ON_YELLOW ."$&".RESET."$'"; Anyway, if I replace the 'while' with an 'if' it highlights the first 'hit'. I just can't get the other(s) to highlight. I know it is because I an modifying $line but this structure has worked before so I am trying to find an alternate "sure-fire" way.

Thanks for any help -rlrandallx

Comment on Highlighting Regex Hits Download Code

Replies are listed 'Best First'.
Re: Highlighting Regex Hits by toolic (Bishop) on May 15, 2010 at 00:56 UTC
This will surround all your hit words with square brackets and show the hit count: `use strict; use warnings; my $line = 'This line has a hit here and a hit there.'; my $word = 'hit'; my $count = $line =~ s/\b($word)\b/[$1]/gi; print "$line\n"; print "$word was found $count times\n"; __END__ This line has a [hit] here and a [hit] there. hit was found 2 times` [download] The substitution operator returns the number of substitutions made.	[reply] [d/l]
Re: Highlighting Regex Hits by ikegami (Patriarch) on May 15, 2010 at 01:14 UTC
It is possible to achieve using m//g. `my $line = 'This line has a hit here and a hit there.'; my $word = 'hit'; my $count = 0; my $hilit = ''; while ($line =~ /(.?)(?:\b($word)\b\|\z)/sgi) { $hilit .= $1; if (defined($2)) { ++$count; $hilit .= "[$2]"; } } print "$hilit\n"; print "$count occurrences of $word\n";` [download] /c would indeed allow you to simplify the above. `my $line = 'This line has a hit here and a hit there.'; my $word = 'hit'; my $count = 0; my $hilit = ''; while ($line =~ /(.?)\b($word)\b/sgci) { $hilit .= "$1[$2]"; ++$count; } $hilit .= substr($line, pos($line)); print "$hilit\n"; print "$count occurrences of $word\n";` [download] But s///g is much simpler. `my $line = 'This line has a hit here and a hit there.'; my $word = 'hit'; my $count = (my $hilit = $line) =~ s/\b($word)\b/[$1]/gi; print "$hilit\n"; print "$count occurrences of $word\n";` [download] Note that if `$word` can contain characters other than those matched by `\w`, `\b` may fail and the contents may be treated as a regex instructions (e.g. `$word="foo.bar"` would match `foolbar`). Update: Fixed a bug in first snippet.	[reply] [d/l] [select]
Re^2: Highlighting Regex Hits by rlrandallx (Initiate) on May 15, 2010 at 02:42 UTC
OK now. Can anyone get it to work with ANSI COLORS? -rlrandallx	[reply]
Re^3: Highlighting Regex Hits by ikegami (Patriarch) on May 15, 2010 at 03:17 UTC
yes	[reply]
Re^4: Highlighting Regex Hits by rlrandallx (Initiate) on May 15, 2010 at 04:43 UTC
Re: Highlighting Regex Hits by JavaFan (Canon) on May 15, 2010 at 15:11 UTC
I don't think anyone has yet explained why the `while (//g) {}` solution isn't working. The problem lies in the assignment to `$line`, which resets `pos()`. So, if there is a hit, the while loop will never terminate^†, it will find the same hit over and over again, each time adding a new pair of brackets. ^†Well, because `$line` will grow two characters each iteration, eventually the program will run out of memory, terminating the program (and hence, the loop).	[reply] [d/l] [select]
Re: Highlighting Regex Hits by Natanael (Acolyte) on May 15, 2010 at 08:55 UTC
An alternative way is to use split, which has an (IMO) not often used feature - it can return both what was matched, and what was between: `my $line = 'This line 1 has a hit here and a hit there.'; my $word = 'hit'; my $count = 0; my $n = 0; my @stuff = split m/($word)/, $line; grep { $n++; if ($n % 2) { print $_; } else { print RED, $_, RESET; $c +ount++; } } @stuff; print "\nFound $count times.\n";` [download] I found, that this scales better, then running m// or s/// trough while loop, on big strings. Also handy if You need to return modified string (split + join), instead of printing it's parts.	[reply] [d/l]
Re^2: Highlighting Regex Hits by ww (Archbishop) on May 15, 2010 at 15:05 UTC
Couple quibbles: re "not often used" is actually fairly common; it's been cited in at least two nodes in the past couple days and re `print RED,`... my 5.10.1 under nx pukes on this (sees "RED" as a filehandle, illegally followed by a comma). Since ikegami has already referred OP to the docs on ANSI, please take this merely as an explanation of why I've used square-brackets rather than colorizing (we won't mention "lazy" here). But a more substantive issue (perhaps) lurks in your `split` where your version will match "hit," "Hitachi," and many others including the vulgar word below (at Note 1): #!/usr/bin/perl use strict; use warnings; # 840126 my @line = ('Not here: line 1', 'This line 2 has a hit here and a hit there.', 'hit me, hit me, bust me in line 3!', "Don't throw a shitfit over that hit in line 4.", # *Note + 1 'Line 5: my search-word does not exist here.'); my $word = 'hit'; my $total_count = 0; for my $line(@line) { my $count = 0; my $n = 0; my @stuff = split m/(\b$word\b)/, $line; # grep { $n++; if ($n % 2) { print $_; } else { print RED, $_, RESET +; $count++; } } grep { $n++; if ($n % 2) { print $_; } else { print "\t[ $_ ]"; $c +ount++; } } @stuff; print "\nFound $count times in the preceding line.\n"; $total_count += $count; } print "Total count: $total_count\n"; =head execution: ww@GIG:~/pl_test$ perl 840126.pl Not here: line 1 Found 0 times in the preceding line. This line 2 has a [ hit ] here and a [ hit ] there. Found 2 times in the preceding line. [ hit ] me, [ hit ] me, bust me in line 3! Found 2 times in the preceding line. Don't throw a shitfit over that [ hit ] in line 4. Found 1 times in the preceding line. Line 5: my search-word does not exist here. Found 0 times in the preceding line. Total count: 5 ww@GIG:~/pl_test$ =cut [download] IOW, I may have overwritten this, but using the word boundary metacharacter to restrict your matches (as in my line 19) is often a good idea.	[reply] [d/l] [select]