audio/video download

I found a site with lots of multi-medium links, music and movies. But I hate the fact that each time in oder to download a file, I have to click the link, clicking the save button, giving a file name, and clicking okay. I am sick of it. So I wrote this script for my own use. Now the script is quite stable, so I think why not share, maybe it is also useful to others.

The script take a file as input. The file contains the URL of the files that you want to download. One line a file. It accepts two formats:

One parameter only. The parameter is the URL.
Three parameters. The first is the pattern of the URL, second and third are the start and end value of the index. Useful when you download holw bunch of page that their URL are numbered.

The input file looks like:

http://blah/blah/blah.wmv
http://blah/blah/labh.wmv
http://foo/foo/foo%04i 23 40
[download]

My favorite part is the format that the progress is reported.

If I want to support more input format, I may change the input part to an XML file, and use something like XML::Simple. But it is good enough for now, as it fits my needs.

Probably I should allow setting local file name. Currently I simply use the last part of the URL as local file name, but some time it is not a valid windows file name, or it might overwrite something has the same name. But if I do this, I probably just go XML.

use IO::Socket::INET;
use Data::Dumper;

use strict;
use warnings;

$| ++;

#@done is populated, but not really used in this version
#in the future, might retry failed ones base on done.
my (@page, @done);

open(PAGES, "<", "pages.txt");
while (<PAGES>) {
    chomp;
    my @data = split / /;
    if ($#data == 2) {
        for my $index ($data[1]..$data[2]) {
            push @page, sprintf($data[0], $index);
            push @done, 0;
        }
    } elsif ($#data == 0) {
        push @page, $data[0];
        push @done, 0;
    }
}
close(PAGES);

download(\@page, \@done);

sub download {
    my ($page, $done) = @_;
    for my $index (0..$#{@$page}) {
        $done[$index] = download_one_file($page[$index]);
    }
}

sub download_one_file {
    my ($page) = @_;
    my $host = get_host($page);
    my $file = get_file($page);

    my $percentage;
    my $header = {};
    my $return_code;
    
    print "Trying $host... ";
    my $connection = IO::Socket::INET->new(PeerAddr => $host,
                    PeerPort => 80,
                    Timeout => 30);
    if ($connection) {
        print "connected, download <$file>\n";
        binmode($connection);
        my $req = "GET $page HTTP/1.0\r\nHost: $host\r\n\r\n";
        print $connection $req;

        my $res;
        if (open(FILE, ">", $file)) {
            binmode(FILE);
            my $bytes_read = 0;
            while (1) {
                my $chunk;
                sysread($connection, $chunk, 4096);
                #stopped, just fail it
                if (!length($chunk)) {
                    print " Connection dropped";
                    $return_code = 0;
                    last;
                }
                if (!exists($header->{"status_code"})) {
                    $res .= $chunk;
                    if (got_header($res)) {
                        $res =~ m/(.*?)\r\n\r\n(.*)/s;
                        if ($2) {
                            syswrite(FILE, $2);
                            $bytes_read += length($2);
                        }
                        parse_header($1, $header);
                        if ($header->{"status_code"} eq '200') {
                            $percentage = get_percentage($header->{"Co
+ntent-Length"}, $bytes_read);
                            print "Content Length = " . $header->{"Con
+tent-Length"} . ", Received = $percentage";
                        } else {
                            print "Status Code = " . $header->{"status
+_code"};
                            $return_code = 1;
                            last;
                        }
                    }
                } else {
                    syswrite(FILE, $chunk); 
                    $bytes_read += length($chunk);
                    my $old_percentage = $percentage;
                    $percentage = get_percentage($header->{"Content-Le
+ngth"}, $bytes_read);
                    if ($percentage != $old_percentage) {
                        print $percentage;
                    }
                }
                if ($header->{"Content Length"} && ($header->{"Content
+-Length"} <= $bytes_read)) {
                    print "Read up to content length\n";
                    $return_code = 1;
                    last;
                }
                if ($res =~ m/\/html/) {
                    print "reached /html tag\n";
                    $return_code = 1;
                    last;
                }
            }
            close(FILE);
            print "\n";
        } else {
            $return_code = 0;
            print "failed to open local file\n";
        }
    } else {
        $return_code = 0;
        print "failed, skip $file\n";
    }
    return $return_code;
}

sub get_host {
    my ($page) = @_;
    $page =~ m/\/\/(.*?)\//;
    return $1;
}

sub get_file {
    my ($page) = @_;
    return (split /\//, $page)[-1];
}

sub get_percentage {
    my ($total, $rcvd) = @_;
    return int(($rcvd / $total) * 10);
}

sub got_header {
    my $res = shift;
    if ($res =~ m/\r\n\r\n/s) {
        return 1;
    } else {
        return 0;
    }
}

sub parse_header {
    my ($res, $header) = @_;
    my @lines = split(/\r\n/, $res);
    $header->{"status_code"} = (split(/ /, $lines[0]))[1];
    for my $index (1 .. $#lines) {
        my ($key, $value) = split(/: /, $lines[$index]);
        $header->{$key} = $value;
    }
}
[download]

_{janitored by ybiC: Balanced <readmore> tags around longish codeblock, to reduce scrolling}

Comment on audio/video download Select or Download Code

Replies are listed 'Best First'.
Re: audio/video download by b10m (Vicar) on Jul 25, 2004 at 20:46 UTC
Wasn't your personal itch the reason why people like wget ? ;) -- b10m All code is usually tested, but rarely trusted.	[reply]
Re^2: audio/video download by thor (Priest) on Jul 26, 2004 at 03:35 UTC
curl is quite nice, too. thor	[reply]
Re: audio/video download by beppu (Hermit) on Aug 02, 2004 at 20:44 UTC
You might also be interested in WWW::Mechanize. It's the greatest.	[reply]
Re: audio/video download by jacques (Priest) on Jul 26, 2004 at 00:03 UTC
Haven't you ever heard of wget?!?	[reply]