I'm trying to use HTML::TokeParser to parse the input elements in an HTML form. Eventually I want to be able to add my own non-standard attributes to control some other features but that's not important at this time.

The following code looks for input, textarea, and select tags. What I'm having trouble with is the loop that is executed if the tag found is a select, to get the option nodes immediately following. It seems to exit the main while() loop as soon as it finishes the inner loop. It doesn't seem to get the final input form element (named text2)!!

Anyone who can tell me why it silently exits the main loop would be appreciated.

I hope this is enough to illustrate:

Code:

use strict; use warnings; use diagnostics; use HTML::TokeParser; use Data::Dumper; # variables my %rules; my $count; my $DEBUG = 1; # get filename of HTML form from commandline and open filehandle to it my $formfile = shift or die "Usage: perl $0 filename\n"; open (my $fh, '<', $formfile) or die "Trouble opening your file!\n$!"; my $parser = HTML::TokeParser->new($fh); while (my $token = $parser->get_tag(qw(input textarea select))) { $count++; my $tag = $token->[0]; my $type = $token->[1]{'type'};# or warn "$count-th Token ($tag ta +g) has no type!"; my $name = $token->[1]{'name'} or warn "$count-th Token ($tag tag) + has no name!"; my $value = $token->[1]{'value'}; my $maxlength = $token->[1]{'maxlength'}; my $required = $token->[1]{'required'}; # non-w3c attribute my $allowed; if ($tag =~ m/select/i) { while (my $option = $parser->get_tag('option')) { push @{$allowed}, $option->[1]{'value'}; } } else { $allowed = [ $token->[1]{'allowed'} ]; # non-w3c attrib +ute } $DEBUG && print "$count\t$tag\t$name\t$type\n"; # $rules{ $name } = sub { print $name, $/; return; }; $rules{$name} = { 'name' => $name, 'type' => $type, 'value' => $value, 'allowed' => $allowed, 'required' => $required, 'maxlength' => $maxlength }; } close $fh; if ($DEBUG and %rules) { open OUT, '>', "$formfile.rules.txt" or die $!; print OUT Dumper(\%rules); close OUT; } exit;

And here is the small sample html form file I created to illustrate the problem:

<html><head></head><body><form action="/"><P>text 1: <INPUT name=text1 +></P><P>textarea: <TEXTAREA name=textarea1 cols=30></TEXTAREA></P> <P><INPUT type=radio value=radio1option1 name=radio1>&nbsp;radio1 opti +on 1 <INPUT type=radio value=radio1option2 name=radio1>&nbsp;radio1 o +ption 2</P> <P><INPUT type=checkbox value=check1option1 name=check1>check 1 option + 1 <INPUT type=checkbox value=check1option2 name=check1>check 1 optio +n 2</P><P>list box:</P><P><SELECT size=3 name=list1> <OPTION value=1> +list1 option1</OPTION> <OPTION value=2>list1 option2</OPTION> <OPTION + value=3>list1 option3</OPTION></SELECT> </P> <P>text 2: <INPUT maxLength=30 size=30 name=text2></P></FORM></body></ +html>

Thanks!

[Jon]


In reply to A question about HTML::TokeParser by theguvnor

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.