Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: RegEx for incorrectly closed HTML attribute?

by dingus (Friar)
on Nov 29, 2002 at 08:50 UTC ( [id://216444]=note: print w/replies, xml ) Need Help??


in reply to RegEx for incorrectly closed HTML attribute?

The problem is likely to be worse than just lack of quote. Once you fix that you will still have the people who forget the closing </a>, or who cock up about 50 other simple syntax rules. I'd suggest you simply have a preview page as we have here in perlmonks where the user can see what his post really looks like.

If you want to validate just what you mention then a checker regex is

my ($openquote, $uri, $closer) = m!<a\s+href\s*=\s*(['"])([^>'"]+)(.)! +i;
Then
  • its valid if $openquote eq $closer.
  • trailing quote omitted if $closer eq '>'.
  • else trailing quote mismatched.
Its up to you to figure out the replacements and/or whether to do it as a single regex - probably better not to try as it will be ugly. Probably its best to reject the post and make the user fix it, that way they won't make a mistake again.

Dingus


Enter any 47-digit prime number to continue.

Replies are listed 'Best First'.
Re: RegEx for incorrectly closed HTML attribute?
by Abigail-II (Bishop) on Nov 29, 2002 at 09:57 UTC
    Unfortunally, that regex will fail if the href attribute contains a quote (an other quote that the delimiting one), if it contains a >; if the attribute value of the href doesn't have quotes, or if there are other attributes between the element name and the href attribute.

    Abigail

      You might want to try for something more like:

      m!<a(\s+[\w]+\s*\=\s*('[^']*'|"[^"]*"))*\s*>.*?</a\s*>!i

      Update: at Abigail-II's suggestion, here's a modified version of the above, which accepts tags like <a href= foo_link>link text</a>. Of course, comments and criticism are always welcome.

      m!<a(\s+[\w]+\s*\=\s*('[^']*'|"[^"]*"|[a-z0-9\-\._:]+))*\s*>.*?</a\s*> +!i

      This should be what you're looking for, because (if I got it right) it successfully detects any valid anchor tag. Once you've got that, you can substitute stuff for SGML entities wherever you haven't found a valid tag, like s/"/&quot;/g and so forth. That way, any invalid code gets printed verbatim. Instead of:
      A link with no closing tag where there really should be one...
      You'll see
      A <a href="#">link with no closing tag where there really should be one...

      One limitation might pop up if the users start nesting anchors inside one another... This is why my initial response if it were my own app and server would be "get smarter users" :o)

      Anyway, as always, there's bound to be faults with what I wrote above. Here's what I used to test it:

      #!/usr/bin/perl for (<>) { m!<a(\s+[\w]+\s*\=\s*('[^']*'|"[^"]*"))*\s*>.*?</a\s*>!i ? pri +nt "match: " : print "no match: "; print; }
      And my dataset:
      <a href="foo">blah</a> <a href="foo>blah</a> <a href='foo" >blah</a> <a href="foo">blah</b> <a href="foo's">bar</a> <a name="blah" href="foo" >bar</a>
      And my results:
      match: <a href="foo">blah</a> no match: <a href="foo>blah</a> no match: <a href='foo" >blah</a> no match: <a href="foo">blah</b> match: <a href="foo's">bar</a> match: <a name="blah" href="foo" >bar</a>

      LAI
      :eof
        Two examples that will fail the regex:
        <A HREF = link>FOO</A> <A HREF = "link"><!-- </a>-->FOO</A>

        Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://216444]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found