Monks:

I need to download some data files from a website, and would welcome any suggestions.

The website has three pull-down menus: firm name, date, and file format. I want to download data for every firm and date. In other words, firm1-June2001, firm1-July2001, …, firm1-December2011, firm2-June2001, firm2-July2001, …, firm2-December2011. I would also like to choose “dat” from the format pull-down menu, and need to press the download button to download the file to my machine.

I also would like to slow the download speed down so I don’t overload the website’s server, and have a file that indicates which firm-date files are downloaded and whether errors occurred. For instance, I want to distinguish between a missing file and a download error.

I am running this program on a Windows machine with Chrome.

I found the following code at http://www.perlmonks.org/?node_id=617277 and am looking for any suggestions on how to adapt it. The notations are my additions.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; #$ is a scalar variable, key-->value, LWP is a virutal browser; my $ua = LWP::UserAgent->new; my $user = 'username'; my $pass = 'password'; my $URL = 'https://www.tta.thomson.com/msi/public1_5clients.html'; #Creating a file name from the URL; my $filename = substr( $URL, rindex( $URL, "/" ) + 1 ); #Prints and /n adds new line; print "$filename\n"; #Output filename into IN; open( IN, ">$filename" ) or die $!; print "Fetching $URL\n"; my $expected_length; my $bytes_received = 0; #Fetches a file from a website; my $req = HTTP::Request->new(GET => $URL); $req->authorization_basic($user, $pass); my $res = $ua->request($req, sub { #@_ is plural of $_ 9 (default variable); my ( $chunk, $res ) = @_; # = assigns a variable, length is a length function, bytes; #_received number; $bytes_received += length($chunk); #printf is a special print function, SD Error stream, decimal; #number with percent; #symbol; unless ( defined $expected_length ) { $expected_length = $res->content_length || 0; } if ($expected_length) { printf STDERR "%d%% - ", 100 * $bytes_received / $expected +_length; } print STDERR "$bytes_received bytes received\n"; # XXX Should really do something with the chunk itself print IN $chunk; } ); print $res->status_line, "\n"; #I think IN holds the file; close IN; exit;

In reply to How To Download DAT Files From Unsecured Website by Marjan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.