CDDB track listing search script

Here's a bit of code for grabbing a list of albums from the gracenote/cddb.com site.

The idea is you run the code like: ./cddb_get_tracklisting search terms and the script will find all albums/artists that contain the words 'search terms' in them. You can then select an individual album and have the script dump the track listing for that album to a file.

Isn't overly tested so there are bound to be bugs, let me know (preferably 'cc' me at the email address;). Main problem with dump of track lists is that HTML entities aren't translated yet (ie & isn't translated into @ in the track listings).

code for file cddb_get_tracklist.pl follows:
#!/usr/bin/perl
my $progname = $0;
$progname =~ s,.*/,,;  # use basename only
my $version = "0.1";

use strict;
use LWP::UserAgent;

# these are configurable - they may change from time to time according
+ to
# CDDB.COM website file system structure:
my $base_url="http://www.gracenote.com";
my $search_uri="/php/search-adv.php3?q=";

# default items to show per page:
my $page_count=10;

# current result/album+artist item:
my $page_curr=1;

# for debugging:
my $debug;

# build query list from args:
my $query_list=join("+", @ARGV);

# if -h flag set or no args/search list, show usage:
if($query_list =~ /-[hH]/ || $query_list eq "") { usage(); }

if($query_list =~ s/\+?-n\+(\d+)\+?//){
    # a number to show per page is given
    # if number / page is > 50, show usage (max per page from cddb.com
+ is 50):
    if($1 > 50){ usage();}

    $page_count=$1;
}

# debug mode?:
if( $query_list =~ s/\+?-d\+?// ){
    $debug=1;
}

# build query url:
my $query_url=$base_url.$search_uri.$query_list."&f=all&s=$page_curr&n
+=$page_count";

print "Query URL: $query_url\n" if $debug;

# start off with first url:
main($query_url);

# this sub is called recursively, once for each 'page' of results ($pa
+ge_curr to ($page_curr + $page_count)):
sub main(){
    # run the query on the query url:
    my $result=get_url(shift);

    if($result->is_success){
        # we got a result, parse it:
        my @result_lines = split("\n", $result->content);

        # strip out the album/artist pairs from the results page:
        my (@album_url) = get_album_url(@result_lines);

        # display results for user to choose an album:
        &choose_album(@album_url) ;
    } else {
        die("
Error retrieving $query_url.

Check and compare the base search URL, \$base_url (=$base_url),
and the search URI, \$search_uri (=$search_uri),
in the code against the currently working url/uri at gracenote.com\n\n
+");
    }
}

sub get_url(){
    my $url = shift;

    # create user agent object:
    my $cddb_ua = new LWP::UserAgent;
    $cddb_ua->agent("$progname/0.1 ");

    # build the request object:
    my $cddb_req = new HTTP::Request GET => $url;

    # make the request:
    return my $cddb_res = $cddb_ua->request($cddb_req);
}


# sub returns a hash of url -> artists / album names:
sub get_album_url(){
    my ($list_started, $list_ended, @result_list);

    foreach (@_){
        # does this line tell us what page we're looking at
        # ie: <p><font size=2 face="Arial,Helvetica">
        # Displaying disc 1-10 of 2542 matching CDs</font></p>
        (/<p>.*?(Displaying disc .*? of .*? matching CDs).*?<\/p>/)&& 
+(push @result_list, $1);

        # is this start of list?:
        (/<!-- LIST START -->/) && ($list_started = 1) && (next);
        (/<!-- LIST END -->/) && ($list_ended);

        # save this list item into array:
        if($list_started && !$list_ended){
            # a list item looks like this:
            # <LI type=circle><FONT face="Arial, Helvetica, sans-serif
+"><B><!-- START ITEM --><!-- REL 100 ENDREL --><A HREF="/xm/pcd/genhi
+phop/e004abd1bd74777c1ad8ec4088ec67c7.html" >The Beastie Boys / Hello
+ Nasty</A><!-- END ITEM -->         </B></FONT> <br>&nbsp;&nbsp;&nbsp
+;Just A <b>Test</b><BR>
            # strip out urls / album title/artist:
            /A HREF="(.*)" >(.*)<\/A>/;
            my $tmp="$1##$2";            # me being stupid and forgett
+ing how to use hashes ;)
            push(@result_list, $tmp);
        }
    }
    # make sure the first item in @result_list is the 'Displaying disc
+ x of n matching CDs
    ($result_list[0] =~ /^Displaying disc/) || die("Unable to retrieve
+ paging info\n");

    return @result_list;
}


sub choose_album(){
    my $page_info=shift;
    my @album_url=@_;

    my $last_page;

    # print paging info:
    print $page_info,"\n";

    for(my $i=0; $i < $page_count; $i++){
        my (undef, $album) = split "##", $album_url[$i];
        printf("%2s. %s\n", $i+1, $album);
    }

    if(scalar(@album_url) < $page_count ){
        $page_count=@album_url;
        $last_page=1;
    }

    print "Select album (0, ..., $page_count)\n";
    print "'q' to quit\n";
    $last_page ? "" : print "Any other key for more...\n";

    while(<STDIN>){
        chomp;
        if(/^(\d+)$/){
            get_track_listing($album_url[$1-1]);
            exit;
        } elsif(/[qQ]/) {
            exit;
        } else {
            # increment current item by $page_count:
            $page_curr=$page_curr+$page_count;
            my $query_url=$base_url.$search_uri.$query_list."&f=all&s=
+$page_curr&n=$page_count";
            # add $page_count onto $page_curr in URL

            # works ok but we want $page_curr globally accessible
            #$query_url=~s/&s=(\d+)&/"&s=".int($1+$page_count)."&"/e;


            $query_url=~s/&s=(\d+)&/&s=$page_curr&/;

            &main($query_url);
        }
    }
}

sub get_track_listing(){
    my ($uri, $album_artist)= split "##", shift;
    my $url=$base_url.$uri;
    my ($artist, $album) = split " / ", $album_artist;
    my $outfile = $album." - ".$artist.".txt";

    # fetch the page containing the track list:
    my $result = &get_url($url);

    # open the output file for printing track list to:
    open(OUTFILE, ">$outfile") || die("Unable to open $outfile for wri
+ting\n");

    if($result->is_success){
        # we got the html page containing the track list ok,
        # parse out the track listing now.
        # track items look like this:
        # <LI><B><FONT face="Arial, Helvetica, sans-serif" >Super Disc
+o Breakin&#39;</FONT></B><br>
        my @result_lines=split("\n", $result->content);
        foreach(@result_lines){
            if( m#<LI><B><FONT.*? >(.*?)</FONT></B><br>#){
                print OUTFILE $1,"\n";
                print $1,"\n" if $debug;
            }
        }
    } else {
        die("Unable to retrieve $url\n");
    }
}

sub usage{
    die<<"EOT";
Usage: $progname [-h] [-d] [-n x] keyword1 ... keywordn

Search/query the cddb.com website for CD-ROM listings including
the search keywords keyword1 to keywordn.

Invoked with argument '-h' prints this help.

Invoked with argument '-d' prints debug info.

Invoked with argument '-n x' prints x number of results per page.
Max x == 50 (max number of 'hits' per page allowed by cddb.com).

EOT
}
1;
__END__

=head1 NAME

cddb_get_tracklist.pl - search for CD discs matching keywords entered 
+on command-line.

=head1 SYNOPSIS

cddb_get_tracklist.pl david holmes

Fetch a list of all albums listed on cddb.com containing the words
'david holmes' in.
Note this searches for occurences of 'david holmes'in any of album nam
+e, artist or track titles.

=head1 DESCRIPTION

Fetches a list of albums from the CDDB website matching the search str
+ing
entered on the command line.

An individual album can then be selected from this list so that the
track listing for that album can be 'dumped' into a file in the curren
+t directory.

With additional arguments, the script will also vary the number of alb
+um titles
per page to display.

=head1 README

Author:
Jez Hancock <jez.hancock@munkboxen.mine.nu>

Date:
20020622113210

Modules used:
LWP::UserAgent

Notes:

You may want to change the output file name format, I use
'album_title - artist.txt', which is good for me, but a lot of ppl
don't like spaces in filenames... up to you...

The code isn't that hot, and no doubt there are untold bugs... feel fr
+ee
to modify the code as you like, please just mail me if you do make any
+ considerable
changes - nice to hear about offspring making it in the world ;)

The code is liable to 'break' at such time that the fine folk at http:
+//gracenote.com
decide to change the search URL/URI format.  This shouldn't be too har
+d to fix and should
just be a matter of finding out the new format and editing the strings
+ $base_url and $search_uri
accordingly below.

Wish list:
To have the numbering fixed when a user 'pages' from one screen of res
+ults to the next.
Presently, first page will show result items numbered: '1 ... 10', sec
+ond page will then
show items numbered: '1 ... 10' also.  This works ok, just an aestheti
+c thing ;)

This script is totally raw!  I only hacked it up because I couldn't fi
+nd it anywhere else
(to my surprise).  Hope others find it useful... if you do let me know
+!
Jez

=head1 USAGE

C<cddb_get_tracklist.pl [-h] [-d] [-n x] keyword1 ... keywordn>

Search/query the cddb.com website for CD-ROM listings including
the search keywords keyword1 to keywordn.

  Invoked with argument '-h' prints this help.

  Invoked with argument '-d' prints debug info.

  Invoked with argument '-n x' prints x number of results per page.
  max x == 50 (max number of 'hits' per page allowed by cddb.com).

=head1 PREREQUISITES

This script requires the C<LWP> module.

=head1 AUTHOR

Copyright 1998-2000, Jez Hancock <jez.hancock@munkboxen.mine.nu> All r
+ights reserved.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
Address bug reports and comments to: jez.hancock@munkboxen.mine.nu

=head1 BUGS

HTML Character Entity References aren't translated into ascii equivale
+nts
(ie &amp; isn't translated into '@')

Minimal paging, could be tweaked.

=head1 SEE ALSO

C<http://search.cpan.org/doc/DSHULTZ/Net-CDDBScan-2.01/CDDBScan.pm>

Interesting looking PM I found only after authoring this hack.

=head1 OSNAMES

any

=head1 SCRIPT CATEGORIES

Audio/MP3

=cut
[download]
defined($nick{munk}) ? &eatfood :""; < http://munkboxen.mine.nu >
Comment on CDDB track listing search script Select or Download Code