Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, The HTML::Parser module is new to me and am seeking some guidance. Basically I want to extract the form elements from a web page. Let's assume the page has the following simple form...
<form name=widget method=post action=/cgi-bin/script.pl> <input type=text name=whatever> <input type=hidden name=another> <select name=choices> <option value=1>first choice</option> <option value=2>second choice</option> </select> <input type=submit value=submit> </form>

I want to be able to display the following...
form name=widget
text name=whatever
hidden name=another
select name=choices

Also, is there a method to grab and display the select choices as well? Any help with this would be most appreciated!

Replies are listed 'Best First'.
Re: HTML::Parser Assistance Requested
by rob_au (Abbot) on Nov 19, 2001 at 07:11 UTC
    For this I would look at HTML::TokeParser for which there is an excellent tutorial on this site by crazyinsomniac - This tutorial also takes you step-by-step through the process of creating a sample application.

    An example program using HTML::TokeParser to do what you need might look like the following - This is based on some of my own code for XML generation from HTML-parsed sources and allows for the definition (and separation of elements therein) of multiple forms in a HTML document:

    #!/usr/bin/perl -Tw use HTML::TokeParser; use LWP::Simple; use XML::Simple; use strict; my $html = get('http://www.perlmonks.org/'); die "LWP::Simple failed to retrieve source HTML - $!" unless ($html); my (%data, $formname, $selectname); my $parser = HTML::TokeParser->new(\$html) || die $!; while (my $token = $parser->get_token) { my $type = shift @{ $token }; if ($type eq "S") { my ($tag, $attr, $attrseq, $text) = @{ $token }; if ($tag eq "form") { $formname = $attr->{'name'} || 'none'; } elsif ($tag eq "input") { push (@{$data{$formname}}, { 'type' => $attr->{'type'}, 'field_name' => $attr->{'name'} || 'none', }) if defined $formname; } elsif ($tag eq "select") { $selectname = $attr->{'name'} || 'none'; } elsif ($tag eq "option") { push (@{$data{$formname}}, { 'type' => "select", 'field_name' => $selectname, 'value' => $attr->{'value'} }) if defined $selectname; }; }; }; my ($xml) = XML::Simple->new(); print STDOUT $xml->XMLout(\%data); exit 0;

    ... and the sample output ...

    <opt> <none field_name="node" type="text" /> <none field_name="go_button" type="image" /> <none field_name="node_id" type="hidden" /> <none field_name="vc" type="hidden" /> <none field_name="op" type="hidden" /> <none field_name="node_id" type="hidden" /> <none field_name="op" type="hidden" /> <none field_name="user" type="text" /> <none field_name="passwd" type="password" /> <none field_name="expires" type="checkbox" /> <none field_name="login" type="submit" /> <none field_name="node_id" type="hidden" /> <none field_name="node_id" type="hidden" /> <none field_name="displaytype" type="hidden" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="vote" type="radio" /> <none field_name="none" type="submit" /> </opt>

    Now it should be relatively straight-forward as to how you can modify this to your needs and give you an idea of the ease with which HTML::TokeParser can extract information from HTML pages.

    Update - Added support for select and option parsing

     

    Ooohhh, Rob no beer function well without!

Re: HTML::Parser Assistance Requested
by merlyn (Sage) on Nov 19, 2001 at 17:49 UTC