in reply to Regex delimiter
Hello monks,
I have the same problem. Mine is a little bit different but in the case the same if I think of: "what delimiter for REGEX I can use?"
My problem has the following extension: I tried to use "§" as delimiter. This I did because I use REGEX on written text and I've found, that nearly any character including brackets will be able to be included in the text I want to process. Now for accident one of my tools has changed the encoding of a script to UTF-8 on upload to github, which was not originally. In the beginning I had Windows 1252 but this did work also under Unix/Linux. But now Perl recognizes a dangerous character before the "§":
Unrecognized character \xA7;The code of the REGEX is as follows:
do { $foundstring =~ s§(<a |\[)([^<>\"]*)(<span class=\"foundterm\">)~~([^~]+)~~(</span>)§$1$2$4§igs; } while $foundstring =~ m§(<a |\[)([^<>\"]*)(<span class=\"foundterm\">)~~([^~]+)~~(</span>)§is;has someone an idea or a hint which character I can use and which is not needed to escape in the text?
Thanks in advance and regards
Extension:
I have done a workaround. To be able to use curly brackets as REGEX delimiters I've replaced curly brackets in the text before the operation and set it back afterwards.
## hide out the curly brackets
$foundstring =~ s|\{|#lcb#|igs;
$foundstring =~ s|\}|#rcb#|igs;
do { $foundstring =~ s{(<a |\[)(^<>\"*)()~~(^~+)~~()}{$1$2$4}igs; }
while $foundstring =~ m{(<a |\[)(^<>\"*)()~~(^~+)~~()}is;
## bring the curly brackets back
$foundstring =~ s|#lcb#|\{|igs;
$foundstring =~ s|#rcb#|\}|igs;
This means in the end it does not matter if someone saves the perl by accident in UTF-8, it will work nonetheless.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex delimiter
by hippo (Archbishop) on Jun 13, 2019 at 08:27 UTC | |
by toohoo (Beadle) on Jun 13, 2019 at 09:37 UTC | |
|
Re^2: Regex delimiter
by AnomalousMonk (Archbishop) on Jun 13, 2019 at 17:03 UTC | |
by toohoo (Beadle) on Jun 14, 2019 at 06:57 UTC |