Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. Here is my problem. I am trying to automate a ratings list for our local table tennis club. I can manually download the USATT ratings for our state at 'http://www.usatt.org', but I don't know how to automate this. Actually, I am now to the point that I don't really know how to approach it. I originally thought I could go to the state ratings page and somehow download the frames, however, that web address doesn't work (because it is automatically generated?). If I go directly to the address, I get the error "Microsoft VBScript runtime error '800a0005' Invalid procedure call or argument: 'left' /history/index2.asp, line 17 ". I guess my new approach would be to have a script actually do a search by state to get to the page then download the frames, but I don't how to do that, or if it is possible. I am fairly new to Perl, so I don't even know if this is supposed to be easy. I have searched for various modules and found WWW::Mechanize::Frames and LWP::UserAgent::FramesReady, but haven't figured it out. I could probably use wget to grab everything, but that seems like major overkill. I would appreciate some ideas as to what to do. Thanks, Scott

Replies are listed 'Best First'.
Re: Frame download
by marto (Cardinal) on Oct 17, 2006 at 17:50 UTC
    Greetings anonymous monk,

    Without wanting to make a bad comment regards the design of the site (Ok I will, a nasty mix of ASP and JavaScript :P ), lets take a step back and solve your problem. A quick look at the search functionality shows that you can call the following url using WWW::Mechanize and store the results:
    http://64.144.108.98/history/Allplayersstate.asp?State=AL
    Substitute your own two letter state code for the correct results. Test this in a web browser to make sure that the results are as expected. I found this out by right clicking the search "AL" link on the "Advanced Search" page, and opened it in a new tab. You may want to contact the webmaster of the site to get permission for site scraping. If this is your first attempt at such things, be careful not to hammer the server with requests.

    If this is your first time here please read the PerlMonks FAQ and How do I post a question effectively? if you have not already done so.

    Hope this helps.

    Martin
      Thanks. For some reason I never saw the State=XX on the end of the address. That takes care of it.
Re: Frame download
by wfsp (Abbot) on Oct 17, 2006 at 18:39 UTC
    Here's my go (based on examples in the docs)...
    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::TableContentParser; my $url = q|http://64.144.108.98/history/Allplayersstate.asp?State=AL| +; my $mech = WWW::Mechanize->new() or die "couldn't get Mech object: $!"; $mech->get($url) or die "couldn't get: $!"; my $html = $mech->content() or die "content failed: $!"; my $p = HTML::TableContentParser->new(); my $tables = $p->parse($html); open my $fh, '>', 'tables.txt' or die "can't open output\n"; for my $t (@$tables) { for my $r (@{$t->{rows}}) { print $fh "Row: "; for my $c (@{$r->{cells}}) { my $data = $c->{data}; print $fh "[$data] "; } print $fh "\n"; } }
    Extract from output:
    Row: [13507] [4/30/1998] [<a href="Phistory.asp?Pid=13200">Acoff, Fred</a>] [1678 (1678)] [AL] [<a href="Tall.asp?Tid=785&SBy=LastName+Asc">3/16/1997</a>] Row: [71636] [5/31/2007] [<a href="Phistory.asp?Pid=56179">Akhtar, Jibran(Brian)</a>] [152 (152)] [AL] [<a href="Tall.asp?Tid=3332&SBy=LastName+Asc">7/8/2006</a>] Row: [45757] [12/31/9999] [<a href="Phistory.asp?Pid=30169">Alexy, Tom</a>] [1342 (1475)] [AL] [<a href="Tall.asp?Tid=3232&SBy=LastName+Asc">3/25/2006</a>] Row: [57249] [3/31/1991] [<a href="Phistory.asp?Pid=37455">Alford, Josh</a>] [594 (594)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>] Row: [57242] [3/31/1991] [<a href="Phistory.asp?Pid=37448">Alford, Tom</a>] [652 (652)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>] Row: [70273] [3/31/2009] [<a href="Phistory.asp?Pid=52193">Alter, Torin</a>] [1289 (1340)] [AL] [<a href="Tall.asp?Tid=3356&SBy=LastName+Asc">9/10/2006</a>] Row: [3865] [5/31/1994] [<a href="Phistory.asp?Pid=3673">Alvey, Clyde Henry</a>] [1578 (1578)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>] Row: [55518] [7/31/1990] [<a href="Phistory.asp?Pid=35871">Ammons, Jermey C.</a>] [859 (859)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>] Row: [59871] [4/30/1993] [<a href="Phistory.asp?Pid=39893">Anderson, Derriel</a>] [1173 (1173)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>] Row: [57295] [9/30/1990 [<a href="Phistory.asp?Pid=37497">Ardoin, Jean Louis</a>] [1567 (1567)] [AL] [<a href="Tall.asp?Tid=0&SBy=LastName+Asc"></a>]
Re: Frame download
by grep (Monsignor) on Oct 17, 2006 at 17:55 UTC
    View the 'Frame Info' on that frame and you'll get the real URL http://64.144.108.98/history/Allplayersstate.asp?State=CO.
    'Frame Info' is available on Firefox. Right-click the frame you are interested in. Go to the Frame sub menu and click Frame Info. I don't use IE so I have no idea how to do it on IE.

    I would advise getting permission to do that, or talk them into providing a RSS feed of the ranksings to make this easier still.



    grep
    One dead unjugged rabbit fish later