The problem of matching tags is really quite simple... Why not something more or less simple like I have posted below? Right off the bat I will say that the code itself probably needs improving (especially those embedded loops halfway through the code). It is not the best code, but it should work fine. This is something I wrote about a year ago, so have mercy :) The code is set up for command-line use (not CGI, so editting will be necessary.

Quick explanation of my code:

  1. Checks for unallowed tags (ex: <script> and/or </script>)
  2. Checks for missing/too many tags (ex: there is a <strong> tag, but no </strong>)
  3. Checks for misspelled tags (ex: there is a <strong> and a </string> tag)
  4. Combinations of numbers 1-3 (ex: there is a <strong> tag and a </b> tag. First, there are mismatching <strong> and </strong>, plus </b> is not even allowed)

I did not include the -w switch simply because you will get a lot of warnings about the following line:

if ($tags{$_} != $tags{"/" . $_}) {

since that line ends up testing non-existant hash keys. Now to the code:

#!path to perl use strict; my %tags; my @errors; my $var_containing_message = qq# Hello everyone!<p> This is a sample test of the things that the <strong>awesome</b> langu +age perl can do!<p> Anyway, when I say <pre>$|++</pre> I am changing buffering! #; #Just a list of all allowed tags (opening and closing) #I had to get rid of the CODE entries to post this #Also, this list is incomplete. DO note however, #that I did not include <i>, <b>, or <u> #<strong> and <em> are much better :) my @allowed_tags = qw( p /p br ul /ul li ol /ol em /em strong /strong small /small sub /sub s +up /sup pre /pre ); #A list of the tags from @allowed_tags list #that REQUIRE a closing tag my @match_required = qw(ul ol em strong pre small sub sup); #Here it is! The code that makes sure all closing #tags are in the message somewhere #loop through to find each HTML tag in message #This counts up all the different HTML tags while ($var_containing_message =~ /<(.*?)>/gs) { my $tmp = lc($1); $tags{$tmp}++; } #Loop through all the found HTML tags and see if #any not-permitted/invalid ones are in there foreach my $found_tag (keys(%tags)) { my $allow = 0; foreach my $permitted_tag (@allowed_tags) { if ($found_tag eq $permitted_tag) { $allow = 1; } } if ($allow != 1) { push @errors, "Tag Not Allowed: $found_tag"; } } #Loop through all the tags requiring closing tags #If they do not have the same # of opening/closing tags, #generate and present the error to the poster foreach (@match_required) { if ($tags{$_} != $tags{"/" . $_}) { #Here is where the error is generated and presented to the poster #Example: push @errors, "Mismatched Tags: <$_> and </$_>"; } } if (@errors == 0) { print "hey, it's all good!"; } else { print "Whoops. There is a problem...\n"; print "$_\n" foreach (@errors); } sleep 3; exit;

In reply to Re: bad tagging that breaks the page by mt2k
in thread bad tagging that breaks the page by rinceWind

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.