I was struck with the feeling that boiling down the form data is something that has probably been done many times over - but a search didn't find anything obvious.
I feel a CPAN module coming on (unless it's already been done and I've missed it), but I'm stuck on which namespace to use: HTML::Formdata, HTTP::Formdata, LWP::HTMLForm, thoughts please.
The sub returns a list of key/value pairs. Thinking about it, I realised that if the calling code turns it into a hash, this could lose any duplicate keys.sub formdata { my ($html,$formname) = @_; my $tp = HTML::TokeParser->new(\$html) or die "Bad HTML form"; while (my $form = $tp->get_tag('form')) { last if !$formname || ($form->[1]{name} eq $formname); $tp->get_text('/form'); } my @form; while (my $field = $tp->get_tag('input','select','textarea')) { my ($tag,$attr) = @$field; if ($tag eq 'textarea') { my $text = $tp->get_text('/textarea'); push @form,$attr->{name},$text; next; } if ($tag eq 'select') { my $selected; while (my $tok = $tp->get_token) { last if $tok->[-1] =~ m(/select)i; my ($typ,$tag,$att) = @$tok; next unless $typ eq 'S' && $tag eq 'option'; $selected = $att->{value} if exists $att->{selected}; } push @form,$attr->{name},$selected if defined $selected; next; } if ($attr->{type} =~ /hidden|password|text/) { push @form,$attr->{name},$attr->{value}; } if ($attr->{type} =~ /radio|checkbox/ && exists $attr->{checked}) { push @form,$attr->{name},$attr->{value}; } } @form; }
At this point, the light of recognition came on in my mind. This was a very familiar concept, that of a CGI object. I could make formdata return a CGI object or something inheriting from CGI, giving access to all the input fields via $form->param.
Besides being capable of being submitted via a normal POST of encoding type application/x-www-form-urlencoded, I would also like the code to be able to handle file uploads and encoding type multipart/form-data.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: LWP Form scraping
by adrianh (Chancellor) on Jan 14, 2003 at 11:03 UTC | |
by Hofmator (Curate) on Jan 14, 2003 at 11:22 UTC | |
Re: LWP Form scraping
by Koschei (Monk) on Jan 17, 2003 at 05:38 UTC | |
Re: LWP Form scraping (consider Webchat)
by grinder (Bishop) on Jan 14, 2003 at 13:49 UTC |