I found a site with lots of multi-medium links, music and movies. But I hate the fact that each time in oder to download a file, I have to click the link, clicking the save button, giving a file name, and clicking okay. I am sick of it. So I wrote this script for my own use. Now the script is quite stable, so I think why not share, maybe it is also useful to others.

The script take a file as input. The file contains the URL of the files that you want to download. One line a file. It accepts two formats:

The input file looks like:

http://blah/blah/blah.wmv http://blah/blah/labh.wmv http://foo/foo/foo%04i 23 40

My favorite part is the format that the progress is reported.

If I want to support more input format, I may change the input part to an XML file, and use something like XML::Simple. But it is good enough for now, as it fits my needs.

Probably I should allow setting local file name. Currently I simply use the last part of the URL as local file name, but some time it is not a valid windows file name, or it might overwrite something has the same name. But if I do this, I probably just go XML.

use IO::Socket::INET; use Data::Dumper; use strict; use warnings; $| ++; #@done is populated, but not really used in this version #in the future, might retry failed ones base on done. my (@page, @done); open(PAGES, "<", "pages.txt"); while (<PAGES>) { chomp; my @data = split / /; if ($#data == 2) { for my $index ($data[1]..$data[2]) { push @page, sprintf($data[0], $index); push @done, 0; } } elsif ($#data == 0) { push @page, $data[0]; push @done, 0; } } close(PAGES); download(\@page, \@done); sub download { my ($page, $done) = @_; for my $index (0..$#{@$page}) { $done[$index] = download_one_file($page[$index]); } } sub download_one_file { my ($page) = @_; my $host = get_host($page); my $file = get_file($page); my $percentage; my $header = {}; my $return_code; print "Trying $host... "; my $connection = IO::Socket::INET->new(PeerAddr => $host, PeerPort => 80, Timeout => 30); if ($connection) { print "connected, download <$file>\n"; binmode($connection); my $req = "GET $page HTTP/1.0\r\nHost: $host\r\n\r\n"; print $connection $req; my $res; if (open(FILE, ">", $file)) { binmode(FILE); my $bytes_read = 0; while (1) { my $chunk; sysread($connection, $chunk, 4096); #stopped, just fail it if (!length($chunk)) { print " Connection dropped"; $return_code = 0; last; } if (!exists($header->{"status_code"})) { $res .= $chunk; if (got_header($res)) { $res =~ m/(.*?)\r\n\r\n(.*)/s; if ($2) { syswrite(FILE, $2); $bytes_read += length($2); } parse_header($1, $header); if ($header->{"status_code"} eq '200') { $percentage = get_percentage($header->{"Co +ntent-Length"}, $bytes_read); print "Content Length = " . $header->{"Con +tent-Length"} . ", Received = $percentage"; } else { print "Status Code = " . $header->{"status +_code"}; $return_code = 1; last; } } } else { syswrite(FILE, $chunk); $bytes_read += length($chunk); my $old_percentage = $percentage; $percentage = get_percentage($header->{"Content-Le +ngth"}, $bytes_read); if ($percentage != $old_percentage) { print $percentage; } } if ($header->{"Content Length"} && ($header->{"Content +-Length"} <= $bytes_read)) { print "Read up to content length\n"; $return_code = 1; last; } if ($res =~ m/\/html/) { print "reached /html tag\n"; $return_code = 1; last; } } close(FILE); print "\n"; } else { $return_code = 0; print "failed to open local file\n"; } } else { $return_code = 0; print "failed, skip $file\n"; } return $return_code; } sub get_host { my ($page) = @_; $page =~ m/\/\/(.*?)\//; return $1; } sub get_file { my ($page) = @_; return (split /\//, $page)[-1]; } sub get_percentage { my ($total, $rcvd) = @_; return int(($rcvd / $total) * 10); } sub got_header { my $res = shift; if ($res =~ m/\r\n\r\n/s) { return 1; } else { return 0; } } sub parse_header { my ($res, $header) = @_; my @lines = split(/\r\n/, $res); $header->{"status_code"} = (split(/ /, $lines[0]))[1]; for my $index (1 .. $#lines) { my ($key, $value) = split(/: /, $lines[$index]); $header->{$key} = $value; } }

janitored by ybiC: Balanced <readmore> tags around longish codeblock, to reduce scrolling

Replies are listed 'Best First'.
Re: audio/video download
by b10m (Vicar) on Jul 25, 2004 at 20:46 UTC

    Wasn't your personal itch the reason why people like wget ? ;)

    --
    b10m

    All code is usually tested, but rarely trusted.
      curl is quite nice, too.

      thor

Re: audio/video download
by beppu (Hermit) on Aug 02, 2004 at 20:44 UTC
Re: audio/video download
by jacques (Priest) on Jul 26, 2004 at 00:03 UTC
    Haven't you ever heard of wget?!?