Separate from the question of handling UTF-8 source code, here are some comments on the regexes.
... use "§" as delimiter ... because I use REGEX on written text and I've found, that nearly any character including brackets will be able to be included in the text ...
But the s/// m// delimiter will not clash with any character in the "bound" text variable nor in an interpolated qr// regex object or plain string:
(However, note that interpolation of plain strings is problematic if they may contain regex metacharacters; for this, see quotemeta and the \Q...\E interpolation modifiers.)c:\@Work\Perl\monks>perl -wMstrict -le "my $text = 'foo/bar/baz/boff zip/zit/zot/zap'; print qq{'$text'}; ;; my $regex_object = qr{ /bar/baz/ }xms; my $plain_string = '/zit/zot/'; ;; $text =~ s/ $regex_object | $plain_string /OTHER/xmsg; print qq{'$text'}; " 'foo/bar/baz/boff zip/zit/zot/zap' 'fooOTHERboff zipOTHERzap'
The use of () {} [] <> as balanced regex delimiters is useful because balanced delimiters | delimiter characters within the regex pattern are handled properly (within reason; character classes present exceptions, but | unescaped delimiter characters within the regex pattern must always be strictly balanced, so [{}] would have worked in the example below):
c:\@Work\Perl\monks>perl -wMstrict -le "my $text = 'foo {bar} baz { whiz } boff'; print qq{A: '$text'}; ;; $text =~ s{ { \s* \w+ \s* } }{OTHER}xmsg; print qq{ '$text'}; ;; $text = 'abc {tuvw} de { xyz } fghi'; print qq{B: '$text'}; ;; $text =~ s{ [\}\{] \s* \w+ \s* [\}\{] }{OTHER}xmsg; print qq{ '$text'}; " A: 'foo {bar} baz { whiz } boff' 'foo OTHER baz OTHER boff' B: 'abc {tuvw} de { xyz } fghi' 'abc OTHER de OTHER fghi'
do { $foundstring =~ s§(<a |\[)([^<>\"]*)(<span class=\"foundterm\">)~ +~([^~]+)~~(</span>)§$1$2$4§igs; } while $foundstring =~ m§(<a |\[) +([^<>\"]*)(<span class=\"foundterm\">)~~([^~]+)~~(</span>)§is;
Doing a substitution that is dependent on a separate, identical m// match in this way is redundant because the s/// replacement will only occur if its own match is successful, and the /g modifier will cause all matches to be replaced:
In case A, the while-loop and substitution only run once because the /g modifier of the s/// causes anything that could match to be replaced. In case B, the same result is achieved with no separate m// match.c:\@Work\Perl\monks>perl -wMstrict -le "my $text = '123 abc 456 de 789 fghi 321'; print qq{A: '$text'}; ;; do { printf 'running s/// -> '; $text =~ s{ [a-z]+ }{OTHER}xmsg; print qq{'$text'}; } while $text =~ m{ [a-z]+ }xms; print qq{done: '$text'}; ;; $text = '123 rs 456 tuvw 789 xyz 321'; print qq{B: '$text'}; ;; $text =~ s{ [a-z]+ }{OTHER}xmsg; print qq{ '$text'}; " A: '123 abc 456 de 789 fghi 321' running s/// -> '123 OTHER 456 OTHER 789 OTHER 321' done: '123 OTHER 456 OTHER 789 OTHER 321' B: '123 rs 456 tuvw 789 xyz 321' '123 OTHER 456 OTHER 789 OTHER 321'
Update: One sometimes sees something like
my $match = qr{ ... }xms;
$string =~ s{ $match }{replace}xms if $string =~ m{ $match }xms;
as a variation on this theme. Again, the substitution will only occur if the $match pattern matches, so the separate m// on the same pattern is redundant.
Give a man a fish: <%-{-{-{-<
In reply to Re^2: Regex delimiter
by AnomalousMonk
in thread Regex delimiter
by Outaspace
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |