Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi

Can You please clear my regexp doubt

$input=' This is sample &lt;This is sample text> This is sample &lt;This is sample text <a>this is sample text</a> This + is sample> ';

I should match all the &lt; to > and stored in an array.

conditions:

Inbetween &lt; to >, <a>..</a> tag may comes 0 or more times

Other tag should not come except <a>..</a>

Don't conside the > in <a> and </a> tag

Edited by Arunbear: Changed title from 'regexp', as per Monastery guidelines

Replies are listed 'Best First'.
Re: encode HTML entities with a regexp
by gopalr (Priest) on May 12, 2005 at 07:01 UTC

    Try this

    use strict; my $input=' This is sample &lt;This is sample text> This is sample &lt;This is sample text <a>this is sample text</a> sam +ple text > '; my @arr=$input=~m#&lt;[^<>]+(?:<a>.+?</a>[^<>]*)?>#g; local $"="\n"; print "\n@arr";

    Output

    &lt;This is sample text> &lt;This is sample text <a>this is sample text</a> sample text >
      I believe this might be a little buggy, unless I misunderstand the requirement. But take a look at:
      use strict; my $input=' This is sample &lt;This is sample text> This is sample &lt;Th<a> </a>is is sa<not a>mple text <a>this is samp +le text</a> sample text >'; my @arr=$input=~m#&lt;[^<>]+(?:<a>.+?</a>[^<>]*)?>#g; print join "\n",@arr;
      I'm wondering about what might be inside <a>..</a>, and whether they should be balanced. In case not, you could simply go for:
      qr {&lt;(?:[^<>]+|<a>|</a>)*>}

      Or perhaps:
      qr {&lt;(?:[^<>]+|<a>|</a>)*[^>]*>}
      Otherwise, you might try:
      my $re; $re=qr{(?:[^<>]+|<a>(??{$re})?</a>)*}; my @arr=$input=~m#&lt;$re>#g;