Brett Wraight has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to save a Web page using OLE navigation I have the web page up but I cannot seem to find the write code to implement saveas
I have tried
$help = $Control_data_send->Navigate("http://www.google.com"); $help->SaveAs("f:\\test.html");

and even
$Control_data_send>ExecWB->OLECMDID_SAVEAS("f:\\test.html");



I took this code from Microsofts help on ole at http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser/webbrowser.asp
Is there a way of using saveas with Explorer

Brett Wraight

20030530 Edit by Corion: Added code tags, linkified link

Replies are listed 'Best First'.
Re: SaveAs ole Explorer
by Corion (Patriarch) on May 30, 2003 at 09:28 UTC

    If all you're after is retrieving pages from the web, there are many other alternatives :

    LWP::Simple

    As long as you need only one page, it's hard to beat LWP::Simple :

    use strict; use LWP::Simple; my $url = 'http://www.google.com'; mirror($url,'f:/test.html');

    WWW::Mechanize

    If you need something that feels more like a browser, it's hard to surpass WWW::Mechanize, as it has "open" and "click" methods that take link names or link numbers as arguments :

    use strict; use WWW::Mechanize; my $url = 'http://www.google.com'; my $agent = WWW::Mechanize->new(); $agent->get($url); $agent->open('Images'); print $agent->content();

    If your path through the pages is more complex, my module WWW::Mechanize::Shell can help you with the development of your WWW::Mechanize script.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
      I think you meant follow(), open() is not a valid WWW::Mechanize method, at least not in the publically available versions.
Re: SaveAs OLE Explorer
by hacker (Priest) on May 30, 2003 at 13:50 UTC
    I think what you might want to look into, is LWP::Simple, using the getstore() method, which would look something like this, using your example:
    use strict; use LWP::Simple qw/getstore/; my $site = getstore('http://www.google.com', 'test.html'); # Or you could store the content in a scalar, with: # my $content = get('http://www.google.com');

    But you also mentioned you wanted images, so you'll want to look into using HTML::SimpleLinkExtor as well, which is used thusly:

    use strict; use HTML::SimpleLinkExtor; my $extract = HTML::SimpleLinkExtor->new(); $extract->parse_file("test.html"); # or you could use $extract->parse($content), as shown # above, using the raw HTML my @images = $extract->img; # feed each element of @images back to a getstore() for # each image found

    If you want to maintain "browsability" of the site when you're done fetching it locally, you'll want to look into rewriting the links found in the document. I would suggest using URI to get the absolute url of the links, fetch those, and save them locally to disk. If you want to maintain the same filenames as referenced in the main HTML page you fetched, you might want to either regex off the filename portion itself, from the link you get, or use File::Basename to chop it off programatically. Here's a regex that can help split the file from the URL you get with URI's new_abs() method:

    $file =~ s/.*[\/\\](.*)/$1/;

    Why would you want to use these modules instead of OLE? Because they're portable, and if your code happens to move from Windows to Linux or BSD or some other non-OLE-supporting variant, it will still continue to work, without you having to rewrite it.

    Good luck in your quest.

Re: SaveAs ole Explorer
by Brett Wraight (Novice) on May 30, 2003 at 12:32 UTC
    Many thanks for this....

    **BUT**

    what I need is a way to save the webpage as whole with the images and other content that generally comes in when you do a save as

    Brett Wraight