Re: Replacing left angle bracket with HTML entity when between two backtick characters
by AnomalousMonk (Archbishop) on Sep 20, 2018 at 06:55 UTC
|
c:\@Work\Perl\monks>perl -wMstrict -le
"use 5.010;
;;
use warnings;
use strict;
;;
use Test::More 'no_plan';
use Test::NoWarnings;
;;
my @tests = (
[ 'is `my <string>` that `also <this> one` too',
'is `my &lgt;string>` that `also &lgt;this> one` too',
],
[ 'is `not <this> one', 'is `not <this> one', ],
[ 'is \`my <NO> that `but <this> one` yes',
'is \`my <NO> that `but &lgt;this> one` yes',
],
);
;;
VECTOR:
for my $ar_vector (@tests) {
if (not ref $ar_vector) {
note $ar_vector;
next VECTOR;
}
;;
my ($string, $expected) = @$ar_vector;
;;
(my $got = $string) =~ s{
(?<! \\) ` [^`]* \K < (?= [^`]* `)
}
{&lgt;}xmsg;
;;
is $got, $expected, qq{'$string' -> '$expected'};
}
;;
done_testing;
"
ok 1 - 'is `my <string>` that `also <this> one` too' -> 'is `my &lgt;s
+tring>` that `also &lgt;this> one` too'
ok 2 - 'is `not <this> one' -> 'is `not <this> one'
ok 3 - 'is \`my <NO> that `but <this> one` yes' -> 'is \`my <NO> that
+`but &lgt;this> one` yes'
1..3
ok 4 - no warnings
1..4
Add more test cases to refine. (In general, see How to ask better questions using Test::More and sample data and Short, Self-Contained, Correct Example.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
If the backslash is allowed to escape backslashes, your approach of using a lookbehind will fail for \\`. I think you could fix that by looking for an odd number of preceding backslashes, but I prefer to look forward only instead.
[ 'is \\\\`my <this> that `but <this> one` no',
'is \\\\`my <this> that `but <this> one` no',
],
| [reply] [d/l] [select] |
|
|
This deals with the specific case you present, but I don't think I'm really dealing in a robust way with escaped backticks and escaped escapes. (Aiieee...) And I don't really like capturing stuff then stuffing it back in a replacement. Oh, well... I gotta go to bed now.
c:\@Work\Perl\monks>perl -wMstrict -le
"use 5.010;
;;
use warnings;
use strict;
;;
use Test::More 'no_plan';
use Test::NoWarnings;
;;
my @tests = (
[ 'is `my <string>` that `also <this> one` too',
'is `my &lgt;string>` that `also &lgt;this> one` too',
],
[ 'is `not <this> one', 'is `not <this> one', ],
[ 'is \`my <NO> that `but <this> one` yes',
'is \`my <NO> that `but &lgt;this> one` yes',
],
[ 'is \\\\`my <this> that `but <this> one` no',
'is \\\\`my &lgt;this> that `but <this> one` no',
],
);
;;
VECTOR:
for my $ar_vector (@tests) {
if (not ref $ar_vector) {
note $ar_vector;
next VECTOR;
}
;;
my ($string, $expected) = @$ar_vector;
;;
(my $got = $string) =~ s{
(?<! (?<! \\) \\) ` [^`]* \K < ([^`]* `)
}
{&lgt;$1}xmsg;
;;
is $got, $expected, qq{'$string' -> '$expected'};
}
;;
done_testing;
"
ok 1 - 'is `my <string>` that `also <this> one` too' -> 'is `my &lgt;s
+tring>` that `also &lgt;this> one` too'
ok 2 - 'is `not <this> one' -> 'is `not <this> one'
ok 3 - 'is \`my <NO> that `but <this> one` yes' -> 'is \`my <NO> that
+`but &lgt;this> one` yes'
ok 4 - 'is \\`my <this> that `but <this> one` no' -> 'is \\`my &lgt;th
+is> that `but <this> one` no'
1..4
ok 5 - no warnings
1..5
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by Corion (Patriarch) on Sep 20, 2018 at 06:58 UTC
|
Parsing the string is the right approach here IMO. I would split up the parsing into two steps. The first steps identifies "plain text" and "text where angle brackets should be escaped", and the second step then escapes the angle brackets where necessary:
#perl
use strict;
use warnings;
sub escape {
my( $str ) = @_;
$str =~ s!<!<!g;
$str
}
while( <DATA>) {
s!(?:^|\G)([^`\\]* # plain string without escape or backtick
|\\. # anything, escaped
)*
(` # backtick
(?:[^\\`]*|\\.) # only escaped backticks
`) # closing backtick
!$1.escape($2)!sgex;
print
};
__DATA__
This is `my <string>` that I want to modify because the angle bracket
+is between backticks
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `a <string> I don't want to modify because it's not between ba
+ckticks
This is \`another <left_angle_bracket>`I don't want to modify because
+the first backtick is escaped but I do want to modify <the_last_left_
+angle_bracket>` between the backticks
At least for these test cases, this works, but I'm not sure if there is/should be a way to quote angle brackets too, like \<. This would mean you have to introduce < into the appropriate character classes as well.
Update: Please see also tybalt89's reply, which contains some good optimizations that prevent backtracking and do the "no/escaped backtick" matching in a more modular way. | [reply] [d/l] [select] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by tybalt89 (Monsignor) on Sep 20, 2018 at 09:36 UTC
|
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1222687
use strict;
use warnings;
my $nobacktick = qr/(?:\\.|[^\\`]++)*+/s;
while( <DATA> )
{
s/ $nobacktick \K ` $nobacktick ` / $& =~ s{<}{<}gr /gex;
print;
}
__DATA__
This is `my <string>` that I want to modify because the angle bracket
+is between backticks
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `a <string> I don't want to modify because it's not between ba
+ckticks
This is \`another <left_angle_bracket>`I don't want to modify because
+the first backtick is escaped but I do want to modify <the_last_left_
+angle_bracket>` between the backticks
`whole string with <string> enclosed in backticks`
`whole string with <string> not enclosed in backticks
`whole string with <string> not enclosed in backticks\`
Outputs:
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `a <string> I don't want to modify because it's not between ba
+ckticks
This is \`another <left_angle_bracket>`I don't want to modify because
+the first backtick is escaped but I do want to modify <the_last_le
+ft_angle_bracket>` between the backticks
`whole string with <string> enclosed in backticks`
`whole string with <string> not enclosed in backticks
`whole string with <string> not enclosed in backticks\`
| [reply] [d/l] [select] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by tybalt89 (Monsignor) on Sep 20, 2018 at 10:26 UTC
|
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1222687
use strict;
use warnings;
#use hairy_regex;
my $nobacktick = qr/(?:(?:\\.)++|[^\\`]++)*+/s;
my $noleftangle = qr/(?:(?:\\.)++|[^\\<]++)*+/s;
while( <DATA> )
{
s/ \G $nobacktick \K ` $nobacktick ` /
$& =~ s! \G $noleftangle \K < !<!gxr /gex;
print;
}
__DATA__
This is `my <string>` that I want to modify because the angle bracket
+is between backticks
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `a <string> I don't want to modify because it's not between ba
+ckticks
This is \`another <left_angle_bracket>`I don't want to modify because
+the first backtick is escaped but I do want to modify <the_last_left_
+angle_bracket>` between the backticks
`whole string with <string> enclosed in backticks`
`whole string with <string> not enclosed in backticks
`whole string with <string> not enclosed in backticks\`
``whole string with <string> not enclosed in backticks`
`whole string with \<string> enclosed in backticks`
Outputs:
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `my <string>` that I want to modify because the angle brack
+et is between backticks
This is `a <string> I don't want to modify because it's not between ba
+ckticks
This is \`another <left_angle_bracket>`I don't want to modify because
+the first backtick is escaped but I do want to modify <the_last_le
+ft_angle_bracket>` between the backticks
`whole string with <string> enclosed in backticks`
`whole string with <string> not enclosed in backticks
`whole string with <string> not enclosed in backticks\`
``whole string with <string> not enclosed in backticks`
`whole string with \<string> enclosed in backticks`
(more test cases probably required)
| [reply] [d/l] [select] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by AnomalousMonk (Archbishop) on Sep 22, 2018 at 03:19 UTC
|
Update: Because of some shortcomings in the code below (use of '&lgt;' instead of '<' as replacement string; inability to handle multiple '<' characters per backtick group; no handling of escaped '<' characters), please see instead a later and greater version posted here. Because it may still be useful as material for archeological study, I won't try to update or delete this code, and all the caveats still apply.
I'd be inclined to think a parser approach would be best for reasons of readability/maintainability. I'm fascinated by regexes, however, so here's my final try, albeit not totally un-hairy I must admit.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
That, my friend, is an amazing piece of work.
| [reply] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters (updated)
by AnomalousMonk (Archbishop) on Sep 23, 2018 at 02:00 UTC
|
I posted a similar piece of code recently, but because of some shortcomings (inexplicable use of '&lgt;' instead of '<' as replacement string; inability to handle multiple '<' characters per backtick group; no handling of escaped '<' characters — the latter two features inspired by the postings of Corion and tybalt89), I've decided to post an update. This is still essentially a programming exercise; I wouldn't necessarily recommend my approach for production code, for which see the efforts of the aforementioned monks. However, even though it handles more features, it is IMHO slightly less hairy regex-wise.
File repl_lt_entity_4.pl:
I won't post the output. And more test cases never hurt.
Update: Here's a slightly more svelte version of the regex logic: it gets rid of one level of alternation nesting. I will only post the ((DEFINE) ... ) version (the qr//-factored version should flow from it in a fairly straightforward way, and there are examples of this translation in the posted code), and I'll only post a drop-in cut/paste of the s/// expression, not a full, working example, so please let me know of any fat-finger errors.
(The $replace at the end is for my current development testing. Set it to the replacement string, or replace it with a string literal, e.g., '<' as before.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by nysus (Parson) on Sep 20, 2018 at 03:04 UTC
|
my $parts = [ split /[^\\]`|^`/, $string ];
my $count = 0;
foreach my $part (@$parts) {
$count++;
next if ($count %2); # only substitute on odd number array items
$part =~ s/</</g;
}
my $new_string = join '`', @$parts;
Still seems lame, but I guess it works. Any other suggestions?
| [reply] [d/l] |
|
|
You can also process the string character by character, keeping the state (inside/outside backticks) in a flag variable. To handle unclosed backsticks, only replace the characters when you meet the closing quote, so keep a list of positions to replace in an array. Replacements are happening from the right so you don't have to recalculate the remaining positions after the length of the string changes.
sub process {
my ($string) = @_;
my $inside;
my $pos = 0;
my @replace;
while ($pos < length $string) {
my $action = {
'\\' => sub { ++$pos },
'<' => sub { unshift @replace, $pos if $inside },
'`' => sub {
unless ($inside = ! $inside) {
substr $string, $_, 1, '&lgt;' for @replace;
$pos += 5 * @replace;
@replace = ();
}
},
}->{ substr $string, $pos, 1 };
$action->() if $action;
++$pos
}
return $string
}
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
Re: Replacing left angle bracket with HTML entity when between two backtick characters
by Anonymous Monk on Sep 20, 2018 at 21:01 UTC
|
Why don't you escape_html the entire string? | [reply] |
|
|
I'm not familiar with the function. But I want to leave html outside of pairs of backticks in tact. Will that function help me do that?
| [reply] |