Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Real Synthetic Audio downloader

by diotalevi (Canon)
on Oct 24, 2002 at 06:36 UTC ( [id://207629]=sourcecode: print w/replies, xml ) Need Help??
Category: Web Stuff
Author/Contact Info Joshua b. Jore aka diotalevi josh@lavendergreens.org
Description: Downloads the weekly Real Synthetic Audio radio shows from http://www.synthetic.org

Update 0: I wrote the JS reading regex wrong and have now corrected it. Please try again.

Update 1: Mr. Muskrat pointed out some other fixes at Re: Real Synthetic Audio downloader and I've incorporated them.

Update Next (as in not actually in there): It'd be even nicer if it kept track of the modification dates on the .js and .asx files so it wouldn't need to GET all the time.


=pod

This script retreives DJ Todd's internet radio show Real
Synthetic Audio from http://www.synthetic.org and stores
local copies of the retrieved audio. You can specify your
own local directory, file extension and url to match by
modifying SaveDir, SaveExt and SaveType. Any newly
downloaded files will be noted on STDOUT so this should
work in a cron script. DJ Todd seems to put new shows up on
or after Sunday so just run the job once a week - say
Monday night. Be nice to his server - it's a good show
and I want it to stay available.

=cut

use strict;
use warnings;
require LWP::UserAgent;
$| = 1;

our $SaveDir = '/home/josh/rsa/';
our $SaveExt = '.wma';
our $SaveType = qr{http://.+?\.asx};
use constant DEBUG => 0;

our ($ua, $rq, $rs);

$ua = LWP::UserAgent->new;

my $downloads = get_downloads();
download_files( $SaveDir, $downloads );

sub get_downloads {
    my %downloads;
    my @js_urls = map "http://synthetic.org/jscript/${_}showlist.js", 
+('', 'previous-');
    
    JSURL: for my $js_url (@js_urls) {
        print "JS $js_url\n" if DEBUG;
        $rs = $ua -> get( $js_url );
        next JSURL unless $rs->is_success;

        my @asx_urls = $rs -> content() =~ m|$SaveType|g;
        ASXURL: for my $asx_url (@asx_urls) {
            print "ASX $asx_url\n" if DEBUG;
            $rs = $ua -> get( $asx_url );
            my $wma = $rs -> content;
            $wma =~ s/[\s\15\12]+//g;

            $wma =~ /(\d+)-(\w+)/;
            my ($date, $speed) = ($1, $2);

            if (not $downloads{$date} or
                $speed eq 'isdn') {
                $downloads{$date} = $wma;
                print "\$downloads{$date} = $wma\n" if DEBUG;
            }
            else {
                print "SKIP $wma $1 $2\n" if DEBUG;
            }
        }
    }

    return \ %downloads;
}

sub download_files {
    my ($directory, $download) = @_;

    for my $base_file (sort keys %$download) {
        my $wma_url = $download -> {$base_file};
        print "$base_file: " if DEBUG;
        my $file = "$directory$base_file$SaveExt";

        if (-e $file) {
            print "SKIP\n" if DEBUG;
            next;
        }

        print "downloading " if DEBUG;

        $rq = HTTP::Request -> new( GET => $wma_url );
        $rs = $ua->request( $rq, $file );

        print $rs -> is_success() ? "OK\n" : "FAIL\n" if DEBUG;
        print "$file\n" unless DEBUG;
    }
}
Replies are listed 'Best First'.
Re: Real Synthetic Audio downloader
by Mr. Muskrat (Canon) on Oct 24, 2002 at 16:22 UTC

    diotalevi++ for introducing me to some wonderful sounding music.

    When you run this, it displays nothing unless you change the value of the DEBUG constant to a true value. Even after changing the DEBUG value, it looks like the program is stuck in an endless loop. In the download_files subroutine where you have the file exist check of $file, I have added an else clause to print that it's downloading like this:

    if (-e $file) { print "SKIP\n" if DEBUG; next; } else { print "downloading...\n" if DEBUG; }

    Also, I found two problems with your code.

    1) You define the $SaveExt variable with this code:
    our $SaveExt = '.wma';
    Then in the download_files subroutine, you use the value like this:
    my $file = "$directory$base_file.$SaveExt";
    To keep the filename from looking like 081902..wma, change this line to:
    my $file = "$directory$base_file$SaveExt";

    2) I looked at the javascript files that this script parses and I found a problem with your regex. You have a $ at the end that should not be there. Change:
    our $SaveType = qr/http:\/\/.+?\.asx$/;
    to:
    our $SaveType = qr/http:\/\/.+?\.asx/;
    and it appears to work as expected.

    Update: You fixed your source as I was typing this up. :)

Re: Real Synthetic Audio downloader
by jdavidboyd (Friar) on Oct 24, 2002 at 13:41 UTC
    Hmm, doesn't do anything for me. Just runs and exits.

      I said so in the intro but you need to provide it with a directory to save to. It's nothing special or anything. Set DEBUG to true if you still have problems.

      __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
        Oops, I should have said that I had done that.

        I changed it to point to my /home/dave/rsa directory, then created the rsa directory so the pointer is valid.

        I also turned on DEBUG, and here is what I get when I run.

        $ RealSyntheticAudio.plx
        JS http://synthetic.org/jscript/showlist.js
        JS http://synthetic.org/jscript/previous-showlist.js

        And nothing is placed into the /home/dave/rsa directory.

        I'm never seeing the 'ASX' line, but I don't know why...

        Hope this helps somewhat in debugging it. Anything else I can try to see why it doesn't work?

        Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://207629]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-28 21:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found