in reply to Re: RegEx for incorrectly closed HTML attribute?
in thread RegEx for incorrectly closed HTML attribute?

I know. The first is an example of illegal HTML (at least, illegal as of XHTML 1.0) and the second is an example of nesting, as I mentioned. In the application Cody Pendant is (writing|maintaining) I would personally accept those as acceptable exceptions: neither will screw up more than the poster's message. As I understood it, the biggest problem with leaving open-ended links or otherwise screwing up the HTML was that the rest of the page would be screwed up as well. These two will get rendered as
<A HREF = link>FOO</A>
and
<!-- -->FOO</A>
respectively (assuming Cody Pendant swaps characters for entities).
LAI
:eof
  • Comment on Re^4: RegEx for incorrectly closed HTML attribute?

Replies are listed 'Best First'.
Re: RegEx for incorrectly closed HTML attribute?
by Abigail-II (Bishop) on Nov 30, 2002 at 15:50 UTC
    The point is the detect wrong or illegal HTML, so assuming the given text validates is silly. If it would validate, the whole excercise would be futile. Also, the first example is valid HTML, and has always been valid HTML. In the second example, no nesting is going on. There's just one A element.

    Abigail

      As I understood the problem, the goal was not necessarily to detect wrong or illegal HTML, but to make sure the output was valid so that posts further down the page are not screwed up. I never suggested that the input be assumed to be valid; in fact the way I built my suggested solution was to detect valid anchors and to render everything else as text (with entities). I feel that my suggestion, while not complete, at least lends itself to being able to prevent user mistakes or ignorance from affecting other posts.

      Oh, and when I mentioned nesting, I meant that the comment inside the anchor element would be treated by my regex like nesting. I know that what you wrote was in fact an example of a legal comment inside a single A element, but since there is no reason for a user to comment the code in a BBS post I felt the mangling of that was an acceptable loss.


      LAI
      :eof

      The first is definitely not:
      HTML 4.01
      XHTML 1.0

      However, it will still display correctly in browsers. A better breaking example might be: <a href = li'nk>FOO</a>

        The first link says:
        In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.

        Which is exactly what I was referring to. In

        <a href = link>foo</a>

        the attribute value contains only letters, and doesn't need quotes. Remember that HTML is an SGML application, and not an XML application.

        Abigail