in reply to tokenize a string
This gives:use warnings; use strict; my $string = q(<a style='postion: top; font:roman' href=hi.html href +='bold' src=" image" />); my @tokens = $string =~ /<?\s*(\w+=([\'\"]).*?\2|[^\s>]+)/g; # Get every other token my $i = 0; @tokens = grep {++$i % 2} @tokens; local $" = "\n"; print "@tokens\n";
This will only work with this data format, i.e. with the token=quoted data layout. The RE uses a back-reference for the second quote because you might have quotes-in-quotes. This complicates the result array, hence the grep.a style='postion: top; font:roman' href=hi.html href='bold' src=" image" /
|
|---|