mp has asked for the wisdom of the Perl Monks concerning the following question:
Example input:
<column>Colum <b>One</b> Header</column> <column>Column <u>Two</u> Header</column> <column na="1">Etcetera</column>
The code below seems to work, I just want to make sure that there are no gotchas with regards to using tags that look like HTML but really aren't valid html (things in angle brackets with optional attributes and optional slash indicating closing tag). I prefer to use HTML::TokeParser over XML::TokeParser because the text between the 'column' tags will in general not be well-formed XML.
use HTML::TokeParser; sub parse_column_list { my ($str) = @_; my $p = HTML::TokeParser->new(\$str); my (@cl, $label, %attr); my %attr_default = ( na => 0 ); while(my $t = $p->get_token) { if ($t->[0] eq "S" and $t->[1] eq "column") { $label = ''; %attr = (%attr_default, %{$t->[2]}); } elsif ($t->[0] eq "E" and $t->[1] eq "column") { push @cl, { %attr, label => $label }; } else { if($t->[0] eq "T") { $label .= $t->[1]; } else { $label .= $t->[-1]; } } } return \@cl; }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Parsing pseudo-HTML with HTML::TokeParser
by Ovid (Cardinal) on Sep 30, 2002 at 21:20 UTC | |
Re: Parsing pseudo-HTML with HTML::TokeParser
by Helter (Chaplain) on Sep 30, 2002 at 17:54 UTC | |
Re: Parsing pseudo-HTML with HTML::TokeParser
by mp (Deacon) on Oct 02, 2002 at 15:56 UTC |