A question about HTML::TokeParser

theguvnor has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use HTML::TokeParser to parse the input elements in an HTML form. Eventually I want to be able to add my own non-standard attributes to control some other features but that's not important at this time.

The following code looks for input, textarea, and select tags. What I'm having trouble with is the loop that is executed if the tag found is a select, to get the option nodes immediately following. It seems to exit the main while() loop as soon as it finishes the inner loop. It doesn't seem to get the final input form element (named text2)!!

Anyone who can tell me why it silently exits the main loop would be appreciated.

I hope this is enough to illustrate:

Code:

use strict;
use warnings;
use diagnostics;
use HTML::TokeParser;
use Data::Dumper;

# variables
my %rules;
my $count;
my $DEBUG = 1;

# get filename of HTML form from commandline and open filehandle to it
my $formfile = shift or die "Usage: perl $0 filename\n";
open (my $fh, '<', $formfile) or die "Trouble opening your file!\n$!";

my $parser = HTML::TokeParser->new($fh);
while (my $token = $parser->get_tag(qw(input textarea select))) {
    $count++;
    my $tag = $token->[0];
    my $type = $token->[1]{'type'};# or warn "$count-th Token ($tag ta
+g) has no type!";
    my $name = $token->[1]{'name'} or warn "$count-th Token ($tag tag)
+ has no name!";
    my $value = $token->[1]{'value'};
    my $maxlength = $token->[1]{'maxlength'};
    my $required = $token->[1]{'required'};        # non-w3c attribute

    my $allowed;
    if ($tag =~ m/select/i) {
        while (my $option = $parser->get_tag('option'))
        {
            push @{$allowed}, $option->[1]{'value'};
        }
    }
    else {
        $allowed = [ $token->[1]{'allowed'} ];        # non-w3c attrib
+ute
    }
    $DEBUG && print "$count\t$tag\t$name\t$type\n";
#    $rules{ $name } = sub {    print $name, $/; return; };
    $rules{$name} = {
        'name'     => $name,
        'type'     => $type,
        'value'    => $value,
        'allowed'    => $allowed,
        'required'    => $required,
        'maxlength'    => $maxlength
    };
}
close $fh;

if ($DEBUG and %rules) {
    open OUT, '>', "$formfile.rules.txt" or die $!;
    print OUT Dumper(\%rules);
    close OUT;
}
exit;
[download]

And here is the small sample html form file I created to illustrate the problem:

<html><head></head><body><form action="/"><P>text 1: <INPUT name=text1
+></P><P>textarea: <TEXTAREA name=textarea1 cols=30></TEXTAREA></P>
<P><INPUT type=radio value=radio1option1 name=radio1>&nbsp;radio1 opti
+on 1 <INPUT type=radio value=radio1option2 name=radio1>&nbsp;radio1 o
+ption 2</P>
<P><INPUT type=checkbox value=check1option1 name=check1>check 1 option
+ 1 <INPUT type=checkbox value=check1option2 name=check1>check 1 optio
+n 2</P><P>list box:</P><P><SELECT size=3 name=list1> <OPTION value=1>
+list1 option1</OPTION> <OPTION value=2>list1 option2</OPTION> <OPTION
+ value=3>list1 option3</OPTION></SELECT> </P>
<P>text 2: <INPUT maxLength=30 size=30 name=text2></P></FORM></body></
+html>
[download]

Thanks!

[Jon]

Comment on A question about HTML::TokeParser Select or Download Code

Replies are listed 'Best First'.

Re: A question about HTML::TokeParser
by Jenda (Abbot) on Oct 12, 2003 at 18:58 UTC

I've never used HTML::TokeParser myself, but I think I know what's the problem. The $parser->get_tag('option') in the inner loop doesn't care about the </select> and tries to give you all <option>s it can find in the rest of the file and when it at last returns undef, the HTML::TokeParser's "cursor" is at the end of the HTML. Therefore there are no more tags to find.

I believe you'll have to do it differently. I think you'll have to have just one look looking for any <input>, <textarea>, <select> or <option>, remember the name of the last seen <select> and append any found <option> to that <select>.

HTH, Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
-- Rick Osborne

[reply]
[d/l]
[select]

Re: A question about HTML::TokeParser
by PodMaster (Abbot) on Oct 13, 2003 at 01:49 UTC

HTML::Form

The reason text2 is not being displayed is because you read until the end of file looking for option tags in the inner loop (logic flaw), example:

use HTML::TokeParser;
my $p = HTML::TokeParser->new(\q[
   <bold>
   <body>
]);
use Data::Dumper;
while(defined(my $t = $p->get_tag('bold'))){
    print Dumper($t);
}
my $t = $p->get_token() ;
print "no more tokens, see " . ( defined $t ? Dumper($t) : "undef" );
__END__
$VAR1 = [
          'bold',
          {},
          [],
          '<bold>'
        ];
no more tokens, see undef
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: A question about HTML::TokeParser
by Roger (Parson) on Oct 13, 2003 at 02:29 UTC

...
while (my $token = $parser->get_token)
{
  next unless $token->[1] =~ /(?:select|input|textarea)/;
  if ($token->[0] eq 'S') # start tag
  {
    $count++;
    my $tag   = $token->[1];
    my $name  = $token->[2]{name}; # fetch name of input
    my $value = $token->[2]{value};
    my $maxlength = $token->[2]{maxlength};
    my $required  = $token->[2]{required};
    my $allowed;
    if ($tag eq 'select')
    {
      while (my $option = $parser->get_token)
      {
        last if $option->[0] eq 'E' && $option->[1] eq 'select';
        next unless $option->[0] eq 'S' && $option->[1] eq 'option';
        push @{$allowed}, $option->[2]{value};
      }
    } else {
      $allowed = [ $token->[2]{allowed} ];
    }

    $DEBUG && print "$count\t$tag\t$name\t\n";
    if ($tag eq 'select') {
      print Dumper($allowed);
    }
  }
  ...
}
[download]

1       input   text1
2       textarea        textarea1
3       input   radio1
4       input   radio1
5       input   check1
6       input   check1
7       select  list1
$VAR1 = [
          '1',
          '2',
          '3'
        ];
8       input   text2
[download]

[reply]
[d/l]
[select]

Re: Re: A question about HTML::TokeParser

by theguvnor (Chaplain) on Oct 14, 2003 at 13:09 UTC

First thanks to everyone who responded. Secondly, apologies for popping up to ask a question and then not responding sooner.. had only intermittent access over the Canadian Thanksgiving weekend. Thirdly, an extra ++ to Roger for providing a working re-write. After seeing the first couple responses, I had begun to think (again, had only limited access to actually play on the weekend) of how I could maybe use the get_token method, but wasn't sure - you provided some proof.

Thanks again to everyone!

[Jon]

[reply]
[d/l]

Re: A question about HTML::TokeParser
by pg (Canon) on Oct 12, 2003 at 18:02 UTC

For what you are doing, a better way might be using HTTP::Request, and steal some code from HTTP::Daemon see how it creates HTTP::Request object base on what is received.

[reply]