Marza has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Fellow Monks.

I have a script that goes out to our pcs and obtains the level information for the virus checkers(ie: engine, DAT, application version).

The manager would like to have the current levels offered by the vendor also listed.

What I would like is to access the vendors download page; This one in fact:

Mcafee Download page

and grab that info.

So what mods should I look at? I didn't notice anything except link checking stuff, etc. from cpan and here.

Any suggestions as in a book, mod, or tutorial would be greatly appreciated.

Thank you!

  • Comment on How do I get information from a vendors web page.

Replies are listed 'Best First'.
Re: How do I get information from a vendors web page.
by gellyfish (Monsignor) on Mar 25, 2002 at 21:22 UTC
Re: How do I get information from a vendors web page.
by shotgunefx (Parson) on Mar 25, 2002 at 21:24 UTC
    Some modules to look at.
    LWP (methods and functions for fetching web pages)
    HTML::TableExtract (Perl extension for extracting the text contained in tables within an HTML document.)
    HTML::TreeBuilder (Parsing of HTML)

    There are many many more. Try searching for "retrieving documents", "parsing html", etc. There are too many modules too list.
    Hope this helps.

    -Lee

    "To be civilized is to deny one's nature."
Re: How do I get information from a vendors web page.
by RMGir (Prior) on Mar 25, 2002 at 21:24 UTC
    libwww-perl is a great place to start; that will handle fetching the HTML of the page for you.

    After that, you can just search for the fields you're interested in with regexes, or use something like HTML::Parser to break up the HTML and then dig for the items you care about.

    If all you care about is the last updated date, I'd skip HTML::Parser and just search for it, myself.
    --
    Mike

Re: How do I get information from a vendors web page.
by RayRay459 (Pilgrim) on Mar 26, 2002 at 00:05 UTC
    marza
    I had a similar need and wrote a script that goes to a few urls, grabs the html and stuffs it into an array and then uses a regex to find the information that i want, and then prints it out. That may be one way to check the version date on McAfee's website. Here's an excerpt of the code that i wrote:
    use strict; use LWP::UserAgent; ##### Declaring my local variables, initializing useragent and open lo +g file.####### my($url, @urls); my $ua = LWP::UserAgent->new(); open(OUT,">results.log") or die "Couldn't open results.log"; print OUT "\n"; ##### Stuffing the urls into an array ##################### @urls = ("http://www.blahblah.com, http://www.blah.com, ); #### Looping through the urls, grabbing the html and stuffing it into +an array. #### foreach $url(@urls){ my $request = new HTTP::Request('GET',$url); $ua->timeout(10); my $response = $ua->request($request); my $responsecode = $response->code(); print "GET failed\n" if $responsecode != 200; my @ARRAY_OF_LINES = (split "\n", $ua->request($request)->as_string); my $row; #### Parsing the html with a regex to find the the updated times ##### +## foreach $row (@ARRAY_OF_LINES) { chomp($row); if ($row =~ /.*?Updated\s*:\s*(\w+\s*-\s*\d{1,2}\s+\d{1,2}:\d{1,2 +}:\d{1,2}\s+PST)/i) { print OUT "Last Updated: $1\n\n"; last; }elsif ($row =~ /.*?Updated\s*:\s*(\w+\s*\w+\s*\d{1,2}\s+\d{1, +2}:\d{1,2}:\d{1,2}\s+PST\s*\d*)/i) { print OUT "Last Updated: $1\n\n"; last; } } } close(OUT);
    Good luck and i hope this can point you in the right direction.
    Ray
Re: How do I get information from a vendors web page.
by talexb (Chancellor) on Mar 25, 2002 at 21:25 UTC
    gellyfish already mentioned two great modules that you can use. You would use LWP::UserAgent to fill in a form and request a page. If you just need to get static pages, LWP::Simple will do (that's also on CPAN).

    --t. alex

    "Here's the chocolates, and here's the flowers. Now how 'bout it, widder hen, will ya marry me?" --Foghorn Leghorn

Thank you everybody!!!!
by Marza (Vicar) on Mar 27, 2002 at 02:11 UTC

    Between the examples and where to look, this was easier than I thought.

    I ended up using the TableExtract module! That is really slick! Made it real easy to do!

    Thanks again!