in reply to a regex to parse html tags
That's a whole lot harder to parse with regular expression.<!-- I changed this, it was just <head><title></title></head> - djb (03 Jul 2001) --> <head> <title>Blah</title> <meta name="DESCRIPTION" value="About </head> tags."> </head>
That said [.\n] creates a character class matching a period and a newline. The []s interperate .s as not special. You could use the /s modifier (see perlre) and just use .+ instead. The /s modifier will make . match even newlines. Another way is to use (?:.|\n) which is the same as (.|\n) except that it doesn't capture anything (into the $<digit> variables.)
Also you need to actually escape / in regexs if you are using / as the deliminator with like: \/, or you can avoid that ugliness by using an alternate deliminator (like m!regex goes here! or m(regex).) I assume the lack of a / at the end of your regex is an error made in posting your code here.
update: To give another example of why not to use a regex: <head something="someattribute">...</head> won't be handled by a simple regex either.
update 2: fixed typo of </head> where </title> was meant and other minor typos.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: a regex to parse html tags
by Hofmator (Curate) on Jul 05, 2001 at 12:52 UTC |