Re: Parsing HTML tags with regex
by tphyahoo (Vicar) on Nov 11, 2005 at 10:46 UTC
|
Not so fast pal. Did you really win the bet? Can your regex process html comments with brackets in them, such as
<!-- Html comment with a bracket... > --!>
No? Use one of the HTML::? modules and go crawl back to your friend and admit you were wrong. | [reply] |
|
Thanks for the notification
| [reply] |
Re: Parsing HTML tags with regex
by gopalr (Priest) on Nov 11, 2005 at 08:50 UTC
|
m#<([^">]+(?:"[^"]+")*[^>]+)>#
Thanks,
Gopal.R | [reply] [d/l] |
|
a < b implies b > a
which does not contain an HTML tag. Oh, and it won't match all HTML tags correctly either. Consider for instance:
<tag attr1="one" attr2="two">
<tag attr='"'> <tag attr1='"'>
The first one fails to match because your regex requires that if there are double quoted values inside a tag, they must follow each other. And the second fails because your regex doesn't consider single quoted values.
| [reply] [d/l] [select] |
|
thanks gopal the above regex was usefull
| [reply] |
|
Hi gopal, THANK YOU VERY MUCH. I won the bet .But now i am in bit of trouble. I donot know how to explain the working to my friends.So could you PLEASE explain the working of the regular expression.Once again THANK YOU VERY MUCH GOPAL.
| [reply] |
|
m#
< ## start with <
( ## group start
[^">]+ ## text but Not match " and >
(?:"[^"]+")* ## if " found, match till end quote found. Its optional
[^>]+ ## text but Not match and >
) ## group end
> ## End with >
#
| [reply] [d/l] |
|
Re: Parsing HTML tags with regex
by pg (Canon) on Nov 11, 2005 at 08:21 UTC
|
"without using HTML::TokenParser"
Why? This is simply not the right decision. In this case, it is more important to do it right, with the right tool - HTML parser (for example what murugu mentioned), but not strugling with the "right regexp".
| [reply] |
Re: Parsing HTML tags with regex
by Skeeve (Parson) on Nov 11, 2005 at 09:47 UTC
|
Being picky again and, correct me anyone knowing better, but <select name="url>adee" value="wq<ew"> is not legal HTML. It has to be encoded as <select name="url>adee" value="wq<ew">
s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
| [reply] [d/l] [select] |
|
Being picky again and, correct me anyone knowing better, but <select name="url>adee" value="wq<ew"> is not legal HTML.
I know better. You are wrong. It is legal HTML. Don't let the fact some browsers can't parse it fool you.
| [reply] [d/l] |
|
Hi skeeve, the actual thing was <select name="url" style="width:125px" size="1" onchange="if (this.selectedIndex>0) parent.location.href=this.options[this.selectedIndex].value;">. I just used replaced it.
| [reply] [d/l] |
|
That's not legal html either.
Oh, sure, people put crap like that on their html pages, but it's not legal html - throw it at any html validator.
The legal version of that is:
<select name="url" style="width:125px" size="1" onchange="if (this.sel
+ectedIndex>0) parent.location.href=this.options[this.selectedIndex
+].value;">
--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/;
map{y/X_/\n /;print}map{pop@$_}@/for@/
| [reply] [d/l] [select] |
Re: Parsing HTML tags with regex
by murugu (Curate) on Nov 11, 2005 at 08:20 UTC
|
Try HTML::Parser.
Regards, Murugesan Kandasamy use perl for(;;);
| [reply] |
Re: Parsing HTML tags with regex
by BUU (Prior) on Nov 11, 2005 at 08:32 UTC
|
It's not really possible with a real regex. HTML is an arbitrarily nested grammar, which doesn't work very well with a "regular" expression. However, given than perl's regexen are of the scary, non regular kind, you could probably manage to do it. Like so..
/(.*)(?{HTML::TokeParser->new( $1 )}/
| [reply] [d/l] |