in reply to Special Variables and Multiple Regexp Matching
Guessing a bit at what you mean by "in context", but here's another approach (Update: slightly buggy: see update below):
>perl -wMstrict -le "my $s = 'a X bar XX c XXX d XXXX e XXXXX foo XXXXXX g'; ;; my @tokens = qw(a bar c d e foo g); my ($token) = map qr{ \b $_ \b }xms, join q{|}, @tokens; my $max = 3; ;; while ($s =~ m{ ( .{0,$max} $token) (?= (.{0,$max})) }xmsg) { print qq{'$1$2'}; } " 'a X ' ' X bar XX' 'XX c XX' 'XX d XX' 'XX e XX' 'XX foo XX' 'XX g'
Update: This version fixes a bug in the definition of $token that allowed 'de' to be matched as a token (or part of one, anyway), and also makes match sub-pattern extraction clearer for demonstration purposes.
>perl -wMstrict -le "my $s = 'a X bar c XXX d XXXX e XXX de XXX foo XXXXXX g h'; ;; my @tokens = qw(a bar c d e foo g h); my ($token) = map qr{ \b (?: $_) \b }xms, join q{|}, @tokens; my $pre_max = 3; my $post_max = 3; my $pre = qr{ .{0,$pre_max}? }xms; ## fixed -- see update below my $post = qr{ .{0,$post_max} }xms; ;; while ($s =~ m{ ($pre) ($token) (?= ($post)) }xmsg) { my ($before, $tok, $after) = ($1, $2, $3); print qq{'$before$tok$after' :$before:$tok:$after:}; } " 'a X ' ::a: X : ' X bar c ' : X :bar: c : ' c XX' : :c: XX: 'XX d XX' :XX :d: XX: 'XX e XX' :XX :e: XX: 'XX foo XX' :XX :foo: XX: 'XX g h' :XX :g: h: ' h' : :h::
Update: Another bug: The regex
my $pre = qr{ .{0,$pre_max} }xms;
misses token 'c' in 'b c' (for tokens 'b' and 'c') if 'c' is within the 'span' of context characters. Replace with
my $pre = qr{ .{0,$pre_max}? }xms;
|
|---|