Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I'm looking for testers to test a quick script I threw together that I'm not sure how to improve upon anymore - it's a basic downloading tool I've created in the last two days. It's nothing fancy, it's useful to automate large amounts of downloads if you need a quick-and-dirty way to download tons of files without needing to babysit them, though.

I built it out of necessity for my job, since I either needed to build a script that automated the download of over 2000 files, find one that works well for the task, or do it myself... I couldn't find one that worked for me, so I learned more about LWP and WWW::Mechanize, asked a couple questions from you great Monks, and have it working! It's currently being used to download said 2000+ files onto one of our computers, so I thought I'd ask for criticism on how to improve the meager tool, and release it into the wilderness of the internet.

It's available for viewing/download on my dropbox here, feel free to download it and tell me what you think. Are there glaring issues with it? Are there features you'd want added if you were going to use this tool? Be nice please, but feel free to critique.

Edit: Here's the code, upon advisement that I should post the code directly here for people to view.
#!/usr/bin/perl -w use strict; use warnings; use LWP::UserAgent; use LWP::Simple; use WWW::Mechanize; use Digest::MD5 qw( md5_hex ); # Coded by Brendan Galvin from June 3rd 2013, to June 5th 2013. # This product is open-source freeware, and credit to the original sou +rce must be given to Brendan Galvin upon re-distribution of the origi +nal source or any program, script or application made using the origi +nal source. # http://www.linkedin.com/pub/brendan-galvin/26/267/94b my $flag=0; print"\n\nURL for mass-downloading (only download links using the <a h +ref> tag, not images or other embedded elements): "; chomp(my $url = <STDIN>); print"\nExtensions to download (seperated by comma's): "; chomp(my $extensions = <STDIN>); $extensions =~ s/[.]//g; print"\nLocation to store downloaded files: "; chomp(my $location = <STDIN>); print"\nHow many downloads would you like to skip starting from the fi +rst (in case you started this download earlier and have already downl +oaded some of the files)? "; chomp(my $skips = <STDIN>); print"\nAre you going to want to skip any files while the program is r +unning (y/n)?"; chomp(my $skiporno = <STDIN>); my $error = ""; my @extension = split(',', $extensions); my %extens = map{$_ => 1} @extension; sub GetFileSize{ my $url=shift; my $ua = new LWP::UserAgent; $ua->agent("Mozilla/5.0"); my $req = new HTTP::Request 'HEAD' => $url; $req->header('Accept' => 'text/html'); my $res = $ua->request($req); if ($res->is_success) { my $headers = $res->headers; return $headers; }else{ $flag = 1; $error .= "Error retrieving file information at $url "; } return 0; } my $mech = WWW::Mechanize->new(); $mech->get($url); my $base = $mech->base; my @links = $mech->links(); for my $link ( @links ) { my $skip = 'n'; if($link->url() =~ m/([^.]+)$/){ my $ext = ($link->url() =~ m/([^.]+)$/)[0]; if(exists($extens{$ext})){ my $newurl = $link->url(); if($newurl !~ /http::\/\/$/ig){ my $baseurl = URI->new_abs($newurl, $base); $newurl = $baseurl; } my $filename = $newurl; $filename =~ m/.*\/(.*)$/; $filename = $1; if($skips > 0){ $skips -= 1; print "\n\nSkipped $filename at " . $link->url(); next; }else{ my $header = GetFileSize($newurl); my $urlmech = WWW::Mechanize->new(); $urlmech->show_progress("true"); print"\n\n\n$filename at $newurl\n"; print "File size: ".$header->content_length." bytes\n" + unless $flag==1; print "Last modified: ".localtime($header->last_modifi +ed)."\n" unless $flag==1; if($skiporno eq 'y'){ print"Skip file (y/n)?"; chomp($skip = <STDIN>); } if($skip ne 'y'){ print " downloading...\n"; my $response = $urlmech->get($newurl, ':content_fi +le' => "$filename", )->decoded_content; }else{ print"\nSkipping...\n\n"; next or print"Error skipping.\n"; } } } } } print"\n\n\nTasks completed.\n"; if($error ne ""){ print"\nErrors: $error"; }else{ print"No errors.\n"; }

In reply to RFC: Code testers/reviewers needed by AI Cowboy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-24 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found