in reply to Re: Converting HTML tags into uppercase using Perl
in thread Converting HTML tags into uppercase using Perl

See, this is why you should never try to parse arbitrary HTML with regular expressions. Your regex doesn't handle a number of very common occurances. The first thing that springs to mind is tags with attributes - the tag name will be upper-cased, but the attribute names will be left untouched. The original poster was unclear as to what sohuld be done in those circumstances.

Also can you be sure that every < character in the document starts a tag? What if it was in a CDATA section?

All in all, I think it's far better to use an HTML parser. They are there to be used, so why not use them?

--
<http://dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re^2: Converting HTML tags into uppercase using Perl

Replies are listed 'Best First'.
Re^3: Converting HTML tags into uppercase using Perl
by inman (Curate) on Nov 29, 2005 at 11:59 UTC
    I figured that this was a homework question anyway and so a reasonable bit of explanation would allow the student to get away with the numerous variations that exist in real HTML. The OP wants to uppercase his tags. He does not mention attributes so I have left it for him to look at.

    A CDATA section is not defined as an HTML tag as defined by the HTML 4 DTD but a <script> tag is which could contain conditional statements (e.g. start < end)that are matched by the regex. Tackling these issues is also something for the guy to look at.