HTML::TokeParser Select List into Array

awohld has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to get HTML::TokeParser to get the values of a Select list. Below is my select list:

    <select name=market onChange="setCookie(this.name,this.selectedInd
+ex)">
    <option value=Chicago>Chicago
    <option value=Wisconsin>Wisconsin
    <option value=Indiana>Indiana
    </select>
[download]

Using this code:

my $baseSpPage = 'http://www.example.com/';

my $mech = WWW::Mechanize->new();

$mech->get("$baseSpPage");

my $html = $mech->content();

my $stream = HTML::TokeParser->new(\$html);

while ( my $token = $stream->get_token ) {
  if ($token->[0] eq 'S') {
    if ($token->[1] eq 'select') {
      print Dumper $token;
    }
  }
}
[download]

My Dumper is givng:

$VAR1 = [
          'S',
          'select',
          {
            'onchange' => 'setCookie(this.name,this.selectedIndex)',
            'name' => 'market'
          },
          [
            'name',
            'onchange'
          ],
          '<select name=market onChange="setCookie(this.name,this.sele
+ctedIndex)">'
        ];
[download]

How do I get the select list out of here? I'd like to pack it into an array.

Ideally I'd like to have an @market array with a list of the states. I'm reading "Perl & LWP" (wish it had a lot more examples) but can't figure it out.

Comment on HTML::TokeParser Select List into Array Select or Download Code

Replies are listed 'Best First'.
Re: HTML::TokeParser Select List into Array by wfsp (Abbot) on Oct 18, 2005 at 07:48 UTC
This uses HTML::TokeParser::Simple which makes the syntax a bit easier #!/bin/perl5 use strict; use warnings; use HTML::TokeParser::Simple; my $select; { local $/; $select = <DATA>; } my $tp = HTML::TokeParser::Simple->new(\$select) or die "Couldn't parse string: $!"; my ($start, @states); while (my $t = $tp->get_token) { $start++, next if $t->is_start_tag('select'); next unless $start; last if $t->is_end_tag('/select'); push @states, $t->get_attr('value') if $t->is_start_tag('option'); } print "$_\n" for @states; __DATA__ <select name=market onChange="setCookie(this.name,this.selectedIndex)" +> <option value=Chicago>Chicago <option value=Wisconsin>Wisconsin <option value=Indiana>Indiana </select> [download]	[reply] [d/l]
Re: HTML::TokeParser Select List into Array by graff (Chancellor) on Oct 18, 2005 at 12:54 UTC
The thing that distinguishes wsfp's code from yours (apart from the fact that he uses the "Simple" version of TokeParser, which is irrelevant), is this: when it detects the start tag for the "select" block in the html data, it sets a state variable and continues to parse through the data. While that state variable is set, subsequent tags and text returned by the parser are stored as components of the "select" block; once the end tag for the "select" block is detected, the state variable is reset to false (the capture is done). In the OP, the code seems to assume (falsely) that once the "select" start tag is seen, this event (and the content returned by the parser) encompasses the whole block. It doesn't. It's just the start-tag, and you have to keep reading until you see the corresponding end tag in order to obtain the whole content of that html block. (If you were trying to extract a tag that was nestable, like "ul" or "table", you'd probably have to maintain a stack, to keep track of nesting level. This probably won't come up for "select". Or you could take a different approach entirely, with something like HTML::Treebuilder, which I have never used, so I'm not well informed about it's suitability here.)	[reply]