comment on

Separate from the question of handling UTF-8 source code, here are some comments on the regexes.

... use "§" as delimiter ... because I use REGEX on written text and I've found, that nearly any character including brackets will be able to be included in the text ...

But the s/// m// delimiter will not clash with any character in the "bound" text variable nor in an interpolated qr// regex object or plain string:

c:\@Work\Perl\monks>perl -wMstrict -le
"my $text = 'foo/bar/baz/boff zip/zit/zot/zap';
 print qq{'$text'};
 ;;
 my $regex_object = qr{ /bar/baz/ }xms;
 my $plain_string =    '/zit/zot/';
 ;;
 $text =~ s/ $regex_object | $plain_string /OTHER/xmsg;
 print qq{'$text'};
"
'foo/bar/baz/boff zip/zit/zot/zap'
'fooOTHERboff zipOTHERzap'
[download]

(However, note that interpolation of plain strings is problematic if they may contain regex metacharacters; for this, see quotemeta and the \Q...\E interpolation modifiers.)

The use of () {} [] <> as balanced regex delimiters is useful because balanced ~~delimiters~~ | delimiter characters within the regex pattern are handled properly (within reason; ~~character classes present exceptions, but~~ | unescaped delimiter characters within the regex pattern must always be strictly balanced, so [{}] would have worked in the example below):

c:\@Work\Perl\monks>perl -wMstrict -le
"my $text = 'foo {bar} baz { whiz } boff';
 print qq{A: '$text'};
 ;;
 $text =~ s{ { \s* \w+ \s* } }{OTHER}xmsg;
 print qq{   '$text'};
 ;;
 $text = 'abc {tuvw} de { xyz } fghi';
 print qq{B: '$text'};
 ;;
 $text =~ s{ [\}\{] \s* \w+ \s* [\}\{] }{OTHER}xmsg;
 print qq{   '$text'};
"
A: 'foo {bar} baz { whiz } boff'
   'foo OTHER baz OTHER boff'
B: 'abc {tuvw} de { xyz } fghi'
   'abc OTHER de OTHER fghi'
[download]

do { $foundstring =~ s§(<a |\[)([^<>\"]*)()~ +~([^~]+)~~()§$1$2$4§igs; } while $foundstring =~ m§(<a |\[) +([^<>\"]*)()~~([^~]+)~~()§is;
[download]

Doing a substitution that is dependent on a separate, identical m// match in this way is redundant because the s/// replacement will only occur if its own match is successful, and the /g modifier will cause all matches to be replaced:

c:\@Work\Perl\monks>perl -wMstrict -le
"my $text = '123 abc 456 de 789 fghi 321';
 print qq{A:    '$text'};
 ;;
 do {  printf 'running s/// -> ';
       $text =~ s{ [a-z]+ }{OTHER}xmsg;
       print qq{'$text'};
       }
 while $text =~ m{ [a-z]+ }xms;
 print qq{done: '$text'};
 ;;
 $text = '123 rs 456 tuvw 789 xyz 321';
 print qq{B: '$text'};
 ;;
 $text =~ s{ [a-z]+ }{OTHER}xmsg;
 print qq{   '$text'};
"
A:    '123 abc 456 de 789 fghi 321'
running s/// -> '123 OTHER 456 OTHER 789 OTHER 321'
done: '123 OTHER 456 OTHER 789 OTHER 321'
B: '123 rs 456 tuvw 789 xyz 321'
   '123 OTHER 456 OTHER 789 OTHER 321'
[download]

In case A, the while-loop and substitution only run once because the /g modifier of the s/// causes anything that could match to be replaced. In case B, the same result is achieved with no separate m// match.

Update: One sometimes sees something like
my $match = qr{ ... }xms;
$string =~ s{ $match }{replace}xms if $string =~ m{ $match }xms;
as a variation on this theme. Again, the substitution will only occur if the $match pattern matches, so the separate m// on the same pattern is redundant.

Give a man a fish: <%-{-{-{-<

In reply to Re^2: Regex delimiter by AnomalousMonk
in thread Regex delimiter by Outaspace

Posts are HTML formatted. Put   tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.