in reply to LWP::Simple on HTTPS sites

Hi,

Step one to fixing this is, forget the program exists, and define your goals

For example , mirror the title/alt/image of all xkcd , so that would be

- get xkcd page
- extract info ( id title alt text image next )
- save files 
- repeat with next

Next up tweak the goals a bit, be nice

- get page if not already exist
- extract info ( id title alt text image next ) and de-html-textify
- save files with safe filenames 
- repeat with next
- wait andor quit, when done quit, when limit reached wait or quit until next time, keep track of progress

Next is write (code) the program of goals

save_xkcd( 'outdir', 'startingid' ); sub save_xkcd { $starting_id ||= id_from_progress(); my @ids = $starting_id; while( @ids ) { my $cid = shift @ids; my $page = sptintf '...%s', $cid; $mech->get( $page ); save_stuff( $mech, $cid ); next_page( $mech , \@ids ); maybe_sleep(); } }

Now all you do is fill in the blanks

No need for CGI in this equation, cgi doesnt like near infinite loops anyway

$mech->title gets you de-htmld text like   xkcd: House of Pancakes

HTML::TreeBuilder::XPath gets you the alt/title text with xpath query of '//img/@title' and next link with a query of '//a[@rel="next"]'

Or  $mech->find_link( text_regex => qr/next/i );

Yes, you could fix up your program by replacing LWP::Simple with mech ... but thats not exactly fun now isnt it :)