Re: RegEx for incorrectly closed HTML attribute?

The problem is likely to be worse than just lack of quote. Once you fix that you will still have the people who forget the closing </a>, or who cock up about 50 other simple syntax rules. I'd suggest you simply have a preview page as we have here in perlmonks where the user can see what his post really looks like.

If you want to validate just what you mention then a checker regex is

my ($openquote, $uri, $closer) = m!<a\s+href\s*=\s*(['"])([^>'"]+)(.)!
+i;
[download]

Then

its valid if $openquote eq $closer.
trailing quote omitted if $closer eq '>'.
else trailing quote mismatched.

Its up to you to figure out the replacements and/or whether to do it as a single regex - probably better not to try as it will be ugly. Probably its best to reject the post and make the user fix it, that way they won't make a mistake again.

Dingus

Enter any 47-digit prime number to continue.

Comment on Re: RegEx for incorrectly closed HTML attribute? Download Code

Replies are listed 'Best First'.
Re: RegEx for incorrectly closed HTML attribute? by Abigail-II (Bishop) on Nov 29, 2002 at 09:57 UTC
Unfortunally, that regex will fail if the href attribute contains a quote (an other quote that the delimiting one), if it contains a `>`; if the attribute value of the href doesn't have quotes, or if there are other attributes between the element name and the href attribute. Abigail	[reply] [d/l]
Re^2: RegEx for incorrectly closed HTML attribute? by LAI (Hermit) on Nov 29, 2002 at 17:55 UTC
You might want to try for something more like: `m!<a(\s+[\w]+\s\=\s('[^']'\|"[^"]"))\s>.?</a\s>!i` [download] Update: at Abigail-II's suggestion, here's a modified version of the above, which accepts tags like `<a href= foo_link>link text</a>`. Of course, comments and criticism are always welcome. `m!<a(\s+[\w]+\s\=\s('[^']'\|"[^"]"\|[a-z0-9\-\._:]+))\s>.?</a\s> +!i` [download] This should be what you're looking for, because (if I got it right) it successfully detects any valid anchor tag. Once you've got that, you can substitute stuff for SGML entities wherever you haven't found a valid tag, like `s/"/"/g` and so forth. That way, any invalid code gets printed verbatim. Instead of: A link with no closing tag where there really should be one... You'll see A <a href="#">link with no closing tag where there really should be one... One limitation might pop up if the users start nesting anchors inside one another... This is why my initial response if it were my own app and server would be "get smarter users" :o) Anyway, as always, there's bound to be faults with what I wrote above. Here's what I used to test it: `#!/usr/bin/perl for (<>) { m!<a(\s+[\w]+\s\=\s('[^']'\|"[^"]"))\s>.?</a\s>!i ? pri +nt "match: " : print "no match: "; print; }` [download] And my dataset: `<a href="foo">blah</a> <a href="foo>blah</a> <a href='foo" >blah</a> <a href="foo">blah</b> <a href="foo's">bar</a> <a name="blah" href="foo" >bar</a>` [download] And my results: `match: <a href="foo">blah</a> no match: <a href="foo>blah</a> no match: <a href='foo" >blah</a> no match: <a href="foo">blah</b> match: <a href="foo's">bar</a> match: <a name="blah" href="foo" >bar</a>` [download] LAI :eof	[reply] [d/l] [select]
Re: RegEx for incorrectly closed HTML attribute? by Abigail-II (Bishop) on Nov 29, 2002 at 18:53 UTC
Two examples that will fail the regex: `<A HREF = link>FOO</A> <A HREF = "link"><!-- </a>-->FOO</A>` [download] Abigail	[reply] [d/l]
Re^4: RegEx for incorrectly closed HTML attribute? by LAI (Hermit) on Nov 29, 2002 at 19:07 UTC
Re: RegEx for incorrectly closed HTML attribute? by Abigail-II (Bishop) on Nov 30, 2002 at 15:50 UTC
Some notes below your chosen depth have not been shown here


No such thing as a small change
	PerlMonks