perlre inverse check for several patterns

averlon has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: perlre inverse check for several patterns by hv (Prior) on Jun 02, 2023 at 15:36 UTC
The easy way would be to set up a hash of "good" words, then scan for each substring to check against the hash: `my %good = map +($_ => 1), qw{ pre strong }; my $string = "xxx<pre>xxx<www>xxx<strong>xxx"; while ($string =~ m{<(\w+)>}g) { warn "bad word '$1'" unless $good{$1}; }` [download] If you don't care about how invalid strings are invalid, then it is easily done in a single match something like: `print "ok\n" if $string =~ m{ ^ ( [^<] \| < (?: pre \| strong ) > )* $ }x;` [download] (Note that this will also reject a string with an unclosed '<', which the first example will not.) If neither of those is what you want, it would be useful if you could say more about precisely what you want to achieve.	[reply] [d/l] [select]
Re: perlre inverse check for several patterns by hippo (Bishop) on Jun 02, 2023 at 14:22 UTC
That looks like HTML. Don't parse HTML with regex, that way lies madness. OK, with that out of the way, your match fails because the lookahead doesn't reset the pos. Include the right angle bracket and you're good to go. `use strict; use warnings; use Test::More tests => 1; my $str = 'xxx<pre>xxx<www>xxx<strong>xxx'; like $str, qr/<(?!strong>)/, "Valid tag found";` [download] But again, don't do this. Use an HTML parser. You'll thank me later. :-) 🦛	[reply] [d/l]
Re^2: perlre inverse check for several patterns by averlon (Sexton) on Jun 02, 2023 at 14:47 UTC
Hi hippo! no, it is not HTML. It is some interface using some formatting strings like (!!like!!) HTML. But unfotunately the interface crashes if some "<>" strings are included which do not match the allowed formatting strings. The strings I process are lines from logfiles. Unfortunately some of these lines include "<xxx>" strings. This brings the interface I use into trouble. So I need to filter them out. I meanwhile found out I get a "true" if I use the following code: `$av_tmp_STRING = "xxx<pre>xxx<www>xxx<strong>xxx"; if ( $av_tmp_STRING =~ m/<(?!strong>)(?!pre)/ ) { #do something with the string which contains wrong patterns }` [download] Still testing if it really works But anyhow. I will keep the example in mind for other use! Thanks Regards Kallewirsch	[reply] [d/l]
Re^3: perlre inverse check for several patterns by haukex (Archbishop) on Jun 03, 2023 at 07:48 UTC
no, it is not HTML. It is some interface using some formatting strings like (!!like!!) HTML. But unfotunately the interface crashes if some "<>" strings are included which do not match the allowed formatting strings. Could you enlighten us as to what exactly this format is and perhaps provide a more representative sample? Also, is it not feasible to fix the crashes in the interface?	[reply]
Re^4: perlre inverse check for several patterns by averlon (Sexton) on Jun 04, 2023 at 07:14 UTC
Re^5: perlre inverse check for several patterns by averlon (Sexton) on Jun 05, 2023 at 08:37 UTC
Re^3: perlre inverse check for several patterns by LanX (Saint) on Jun 05, 2023 at 12:51 UTC
> it is not HTML. It is some interface using some formatting strings like (!!like!!) HTML. But unfotunately the interface crashes if some `"<>"` strings are included which do not match the allowed formatting strings. So what's wrong with tybalt89's approach? see Re: perlre inverse check for several patterns Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re: perlre inverse check for several patterns by tybalt89 (Monsignor) on Jun 02, 2023 at 17:22 UTC
Perhaps you just want to remove any patterns that are not allowed... `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11152608 use warnings; my @lines = split /^/, <<END; xxx<pre>xxx<www>xxx<strong>xxx xxx<pre>xxxxxx<strong>xxx xxx<pre>xxx<pr>xxx<strong>xxx xxx<pre>xxx<pree>xxx<strong>xxx xxx<pre>xxx<strong>xxx<strong>xxx xxx<pre>xxx<pre>xxx<strong>xxx END my %allowed = map { ( '<'.$_.'>' ) x 2 } qw( pre strong ); for ( @lines ) { my $clean = s[<\w+>][ $allowed{$&} // '' ]ger; print $clean; }` [download] Outputs: `xxx<pre>xxxxxx<strong>xxx xxx<pre>xxxxxx<strong>xxx xxx<pre>xxxxxx<strong>xxx xxx<pre>xxxxxx<strong>xxx xxx<pre>xxx<strong>xxx<strong>xxx xxx<pre>xxx<pre>xxx<strong>xxx` [download]	[reply] [d/l] [select]
Re: perlre inverse check for several patterns by haukex (Archbishop) on Jun 02, 2023 at 14:35 UTC
Do not use regular expressions to parse HTML/XML. Assuming your input is indeed HTML, here's a possible solution using Mojo::DOM, based on my code here. use warnings; use strict; print html_filter(<<'END_HTML', qw/pre strong i/), "\n"; aaa<pre>bbb</pre>ccc<www><i>ddd</i><strong>eee</strong>fff</www>ggg END_HTML use Mojo::DOM; sub html_filter { my $html = shift; my %allowed = map {$_=>1} @_; my $walk; $walk = sub { my ($in, $out) = @_; for my $n ( @{ $in->child_nodes } ) { if ( $n->type eq 'cdata' \|\| $n->type eq 'text' ) { $out->append_content($n->content) } elsif ( $n->type eq 'tag' ) { if ($allowed{$n->tag}) { my $t = $out->new_tag( $n->tag, %{$n->attr} ) ->child_nodes->first; $walk->($n, $t); $out->append_content($t); } else { $walk->($n, $out) } } # ignore other node types for now } return $out; }; return $walk->(Mojo::DOM->new($html), Mojo::DOM->new)->to_string; } __END__ aaa<pre>bbb</pre>ccc<i>ddd</i><strong>eee</strong>fffggg [download]	[reply] [d/l]
Re: perlre inverse check for several patterns by kcott (Archbishop) on Jun 04, 2023 at 11:41 UTC
G'day Kallewirsch, It would have been better had you provided all information up-front. From your responses to hippo and haukex, I've determined the following. You're using WWW::Telegram::BotAPI. This is a front-end to "Telegram Bot API". It's sendMessage method documentation describes "HTML style". You should read that entire section; this extract highlights the main point that applies to you: "... All `<`, `>` and `&` symbols ... must be replaced with the corresponding HTML entities ..." From that, and based on what you've revealed so far, you need to modify `$av_tmp_LINE` before combining it with `$av_tmp_STRING`. Here's an example: $ perl -e ' use 5.010; use strict; use warnings; my $av_tmp_LINE = "Jun 3 23:20:05 f42252s5 postfix/pickup[204714] +: E1E63A045C: uid=33 from=<www-data>"; say "BEFORE: $av_tmp_LINE"; $av_tmp_LINE =~ s/([&<>])/char_to_entity($1)/eg; say "AFTER: $av_tmp_LINE"; my $av_tmp_STRING = "Logfile: " . "<strong>" . q{$av_obj_TMP->{inp +ut}} . "</strong>" . " " . $av_tmp_LINE; say "\$av_tmp_STRING[$av_tmp_STRING]"; sub char_to_entity { my ($char) = @_; state $entity_for = {qw{& & < < > >}}; return $entity_for->{$char}; } ' BEFORE: Jun 3 23:20:05 f42252s5 postfix/pickup[204714]: E1E63A045C: u +id=33 from=<www-data> AFTER: Jun 3 23:20:05 f42252s5 postfix/pickup[204714]: E1E63A045C: u +id=33 from=<www-data> $av_tmp_STRING[Logfile: <strong>$av_obj_TMP->{input}</strong> Jun 3 2 +3:20:05 f42252s5 postfix/pickup[204714]: E1E63A045C: uid=33 from=< +www-data>] [download] Note: I haven't tried to interpolate `$av_obj_TMP->{input}` as I've no idea what its value is. — Ken	[reply] [d/l] [select]


Welcome to the Monastery
	PerlMonks