comment on

I convert data for a living and have not dealt with browsers or the internet directly with Perl. However, recently a client asked us to directly download their data from their secure website. This was something new (and exciting!) that I had not done so I went off, did some research, and wrote a program that has worked fairly well. Recently I ran the program to download the data and received a certificate error, which I had never seen before. OK, so, researched that, added code, now it by-passes that. However, my other challenge I have not been able to resolve is this... the program doesn't download the entire page of data any longer. It gets maybe 90% - 95% of the page and then stops and moves on to the next page of data. The only difference I can think of is that I upgraded from Activestate 5.10 to 5.16 but, I wouldn't think that would make a difference but it might. If I use the URL directly in my browser (any page of data) the entire page of data downloads just fine so ... I'm not sure what you guys might need to help out but, I need to be conscience of proprietary information.

Here is the major piece of code doing the work, with names changed to protect the innocent. :)

while ($more) {
  $page++;
  $url = "https://[server name is here]/[path information here]/$eleme
+nt/HAY/?page=$page";
  $filepage = "0" x (3 - length($page)) . $page;
  $response = $browser->get($url,':content_file' => $tempxml,);
  $file = "$output\\$element" . "_" . $filepage . ".xml";
  $response = $browser->get($url,':content_file' => $file,);
  die "Couldn't get $url\n" unless defined $response;
  $more = &check_tmp;
  unlink ("temp.xml");
  print "Completed $element page \($page\) file \($filepage\) \($more\
+) ...\
}
[download]

Because there is more than one page of data and I do not know the last page of data I use a temp.xml file to download the data then check the file to see if it has data, if it does I copy it to another location then delete temp.xml and basically grab the next page of data and loop that until no more page data is available. To get past the certificate issue I added code...

$browser = LWP::UserAgent->new(ssl_opts => { verify_hostname => 0,
                                           SSL_verify_mode => SSL_VERI
+FY_NONE});
[download]

I also have browser credentials, etc. that work fine. So, any clue as to why I am no longer getting the entire page of XML data any longer? And thanks for your time folks!

In reply to LWP Browser->Get Challenge by rkellerjr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.