I'm trying to archive the wiring harness info for each Motorcycle type at http://www.datatool.co.uk/bikes1.asp.

My problem is that I can't get the indexing of the arrays to cycle through each model of a certain manufacturer of a motorcycle.

First I store the Manufacturer of each motorcycle in an array, that works.

Then for each Manufacturer of motorcycle, I go to that Manufacturers page and store the <option value=X> for each motorcycle where X stands for a certain model.

Then I go to that models page and do a HTML::TokeParser->get_text('/table') to store the table data.

The Problem I'm having is I can't seem to be able to index through each Model/Manufacturer correctly to access each wiring harness page. What's wrong with my code? I know I'm close.

The end result is I want to store all the wiring harness info into a CSV file.

Here's my code:
#!/usr/bin/perl -w use LWP::UserAgent; use HTML::TokeParser; use Data::Dumper; my $url = 'http://www.datatool.co.uk/bikes1.asp'; my $url2 = 'http://www.datatool.co.uk/bikes2.asp'; my $browser = LWP::UserAgent->new(); my $response = $browser->get($url); die "Error getting $url: ", $resp->status_line unless $response->is_success; die "It's not HTML, it's ", $resp->content-type unless $response->content_type eq 'text/html'; my $html = $response->content; open(DAT,'>',"c:\\cas.txt") || die("Cannot Open File"); my $stream = HTML::TokeParser->new( \$html ) || die "Couldn't read HTML string: $!"; my @manufact; my @models; while ( my $token = $stream->get_token ) { if ($token->[0] eq 'S' and $token->[1] eq 'option' and $token->[2]{'v +alue'} ne ''){ push(@manufact, $token->[2]{'value'}); } } #print Dumper @manufact; #sleep 5; my $i=0; foreach(@manufact){ $response = $browser->post( $url, [ 'Manufacturer' => "$manufact[$i]", 'btnSearch' => 'Search for matching Models' ] ); die "Error getting $url: ", $resp->status_line unless $response->is_success; die "It's not HTML, it's ", $resp->content-type unless $response->content_type eq 'text/html'; $html = $response->content; # print $html; $stream = HTML::TokeParser->new( \$html ) || die "Couldn't read HTML string: $!"; while ( $token = $stream->get_token ) { if ($token->[0] eq 'S' and $token->[1] eq 'option' and $token->[2] +{'value'} ne '' and $token->[2]{'value'} lt 'A') { push(@models, $token->[2]{'value'}); } } my $x=0; while (@models){ $response = $browser->post( $url2, [ 'Manufacturer' => "$manufact[$i]", 'Model' => "$models[$x]", 'btnSearch' => 'Search for matching Models' ] ); die "Error getting $url2: ", $resp->status_line unless $response->is_success; die "It's not HTML, it's ", $resp->content-type unless $response->content_type eq 'text/html'; $x++; $html = $response->content; $stream = HTML::TokeParser->new( \$html ) || die "Couldn't read HTML string: $!"; my $text = $stream->get_text('/table'); print $text; print DAT $text; } } close(DAT);

In reply to Trying To Archive Web Info With My Buggy Code?? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.