Re: Highlighting Regex Hits

An alternative way is to use split, which has an (IMO) not often used feature - it can return both what was matched, and what was between:

my $line = 'This line 1 has a hit here and a hit there.';
my $word = 'hit';
my $count = 0;

my $n = 0;
my @stuff = split m/($word)/, $line;
grep { $n++; if ($n % 2) { print $_; } else { print RED, $_, RESET; $c
+ount++; } } @stuff;
print "\nFound $count times.\n";
[download]

I found, that this scales better, then running m// or s/// trough while loop, on big strings. Also handy if You need to return modified string (split + join), instead of printing it's parts.

Comment on Re: Highlighting Regex Hits Download Code

Replies are listed 'Best First'.
Re^2: Highlighting Regex Hits by ww (Archbishop) on May 15, 2010 at 15:05 UTC
Couple quibbles: re "not often used" is actually fairly common; it's been cited in at least two nodes in the past couple days and re `print RED,`... my 5.10.1 under nx pukes on this (sees "RED" as a filehandle, illegally followed by a comma). Since ikegami has already referred OP to the docs on ANSI, please take this merely as an explanation of why I've used square-brackets rather than colorizing (we won't mention "lazy" here). But a more substantive issue (perhaps) lurks in your `split` where your version will match "hit," "Hitachi," and many others including the vulgar word below (at Note 1): #!/usr/bin/perl use strict; use warnings; # 840126 my @line = ('Not here: line 1', 'This line 2 has a hit here and a hit there.', 'hit me, hit me, bust me in line 3!', "Don't throw a shitfit over that hit in line 4.", # *Note + 1 'Line 5: my search-word does not exist here.'); my $word = 'hit'; my $total_count = 0; for my $line(@line) { my $count = 0; my $n = 0; my @stuff = split m/(\b$word\b)/, $line; # grep { $n++; if ($n % 2) { print $_; } else { print RED, $_, RESET +; $count++; } } grep { $n++; if ($n % 2) { print $_; } else { print "\t[ $_ ]"; $c +ount++; } } @stuff; print "\nFound $count times in the preceding line.\n"; $total_count += $count; } print "Total count: $total_count\n"; =head execution: ww@GIG:~/pl_test$ perl 840126.pl Not here: line 1 Found 0 times in the preceding line. This line 2 has a [ hit ] here and a [ hit ] there. Found 2 times in the preceding line. [ hit ] me, [ hit ] me, bust me in line 3! Found 2 times in the preceding line. Don't throw a shitfit over that [ hit ] in line 4. Found 1 times in the preceding line. Line 5: my search-word does not exist here. Found 0 times in the preceding line. Total count: 5 ww@GIG:~/pl_test$ =cut [download] IOW, I may have overwritten this, but using the word boundary metacharacter to restrict your matches (as in my line 19) is often a good idea.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Highlighting Regex Hits
by ww (Archbishop) on May 15, 2010 at 15:05 UTC

Couple quibbles:

re "not often used" is actually fairly common; it's been cited in at least two nodes in the past couple days
and re print RED,... my 5.10.1 under *n*x pukes on this (sees "RED" as a filehandle, illegally followed by a comma). Since ikegami has already referred OP to the docs on ANSI, please take this merely as an explanation of why I've used square-brackets rather than colorizing (we won't mention "lazy" here).

But a more substantive issue (perhaps) lurks in your split where your version will match "hit," "Hitachi," and many others including the vulgar word below (at Note 1):

#!/usr/bin/perl
use strict;
use warnings;

# 840126

my @line = ('Not here: line 1',
            'This line 2 has a hit here and a hit there.',
            'hit me, hit me, bust me in line 3!',
            "Don't throw a shitfit over that hit in line 4.",  # *Note
+ 1 
            'Line 5: my search-word does not exist here.');

my $word = 'hit';
my $total_count = 0;

for my $line(@line) {
    my $count = 0;
    my $n = 0;
    my @stuff = split m/(\b$word\b)/, $line;
#   grep { $n++; if ($n % 2) { print $_; } else { print RED, $_, RESET
+; $count++; } }
    grep { $n++; if ($n % 2) { print $_; } else { print "\t[ $_ ]"; $c
+ount++; } } @stuff;
    print "\nFound $count times in the preceding line.\n";
    $total_count += $count;
}
print "Total count: $total_count\n";


=head execution:

ww@GIG:~/pl_test$ perl 840126.pl
Not here: line 1
Found 0 times in the preceding line.
This line 2 has a     [ hit ] here and a     [ hit ] there.
Found 2 times in the preceding line.
    [ hit ] me,     [ hit ] me, bust me in line 3!
Found 2 times in the preceding line.
Don't throw a shitfit over that     [ hit ] in line 4.
Found 1 times in the preceding line.
Line 5: my search-word does not exist here.
Found 0 times in the preceding line.
Total count: 5
ww@GIG:~/pl_test$

=cut
[download]

IOW, I may have overwritten this, but using the word boundary metacharacter to restrict your matches (as in my line 19) is often a good idea.

[reply]
[d/l]
[select]