wildbill has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to filter a single variable $text that contains multiple HTML lines seperated by newlines. I need to convert something like:
<TABLE CELLPADDING="8" BGCOLOR="#000000" > <TR><TH COLSPAN="2" BGCOLOR="#0099ff" >
to something like:
<TABLE CELLPADDING="8" BGCOLOR="#000000"> <TR><TH COLSPAN="2" BGCOLOR="#0099ff">
I have tried $text =~ s/(<.*)\n(.*>)/$1$2/mg; but that isn't working.

Replies are listed 'Best First'.
Re: How do I remove newlines between nested different delimiters on different lines?
by Abigail-II (Bishop) on Aug 01, 2002 at 17:46 UTC
    Ignoring the fact that /<[^>]*>/ is not a good regex to match HTML tags, this will do what you want:
    s((<[^>]*>))(local($_=$1);tr;\n;;d;$_)eg;
    Abigail
Re: How do I remove newlines between nested different delimiters on different lines?
by Anonymous Monk on Aug 01, 2002 at 21:48 UTC
    If you want this kind of things done "properly" you could use HTML::Parser (or some friend). It's quite easy to use and you can format the tags in whatever way you want.

    Here's one way of doing it:
    use HTML::Parser; sub quote { my ($delim, $text) = @_; $text =~ s/([\\\Q$delim\E])/\\$1/g; return "$delim$text$delim"; } sub fix_tags { my ($tags) = @_; my $result; my $p = HTML::Parser::->new( api_version => 3, start_h => [ sub { my ($tagname, $attr, $attrseq) = @_; my @pairs = map { "$_=" . quote('"', $attr->{$_}) } @$attrseq; $result .= '<' . join(' ', $tagname, @pairs) . '>'; }, 'tagname, attr, attrseq' ], default_h => [ sub { $result .= shift }, 'text' ], ); $p->parse($html); return $result; }
    Cheers,
    -Anomo