in reply to How to find XML tags using regular expression
You can try this (ignoring CDATA sections, but including other weird stuff, like the XML declaration):
And of course, you'll have some cleaning up to do now, because the output is very coarse.@tags = grep defined, $xml =~ /<!--.*?-->|(<(?>[^"'>]+|'[^']*'|"[^"]*" +)*>)/sg;
As a first step into parsing this content, you can use the above regexp in split:
which will return a list of text and tags. Comments will be thrown away.@tokens = split /<!--.*?-->|(<(?>[^"'>]+|'[^']*'|"[^"]*")*>)/s, $xml;
|
|---|