Here's a bit of code for grabbing a list of albums from the gracenote/cddb.com site.

The idea is you run the code like: ./cddb_get_tracklisting search terms and the script will find all albums/artists that contain the words 'search terms' in them. You can then select an individual album and have the script dump the track listing for that album to a file.

Isn't overly tested so there are bound to be bugs, let me know (preferably 'cc' me at the email address;). Main problem with dump of track lists is that HTML entities aren't translated yet (ie & isn't translated into @ in the track listings).

code for file cddb_get_tracklist.pl follows:

#!/usr/bin/perl my $progname = $0; $progname =~ s,.*/,,; # use basename only my $version = "0.1"; use strict; use LWP::UserAgent; # these are configurable - they may change from time to time according + to # CDDB.COM website file system structure: my $base_url="http://www.gracenote.com"; my $search_uri="/php/search-adv.php3?q="; # default items to show per page: my $page_count=10; # current result/album+artist item: my $page_curr=1; # for debugging: my $debug; # build query list from args: my $query_list=join("+", @ARGV); # if -h flag set or no args/search list, show usage: if($query_list =~ /-[hH]/ || $query_list eq "") { usage(); } if($query_list =~ s/\+?-n\+(\d+)\+?//){ # a number to show per page is given # if number / page is > 50, show usage (max per page from cddb.com + is 50): if($1 > 50){ usage();} $page_count=$1; } # debug mode?: if( $query_list =~ s/\+?-d\+?// ){ $debug=1; } # build query url: my $query_url=$base_url.$search_uri.$query_list."&f=all&s=$page_curr&n +=$page_count"; print "Query URL: $query_url\n" if $debug; # start off with first url: main($query_url); # this sub is called recursively, once for each 'page' of results ($pa +ge_curr to ($page_curr + $page_count)): sub main(){ # run the query on the query url: my $result=get_url(shift); if($result->is_success){ # we got a result, parse it: my @result_lines = split("\n", $result->content); # strip out the album/artist pairs from the results page: my (@album_url) = get_album_url(@result_lines); # display results for user to choose an album: &choose_album(@album_url) ; } else { die(" Error retrieving $query_url. Check and compare the base search URL, \$base_url (=$base_url), and the search URI, \$search_uri (=$search_uri), in the code against the currently working url/uri at gracenote.com\n\n +"); } } sub get_url(){ my $url = shift; # create user agent object: my $cddb_ua = new LWP::UserAgent; $cddb_ua->agent("$progname/0.1 "); # build the request object: my $cddb_req = new HTTP::Request GET => $url; # make the request: return my $cddb_res = $cddb_ua->request($cddb_req); } # sub returns a hash of url -> artists / album names: sub get_album_url(){ my ($list_started, $list_ended, @result_list); foreach (@_){ # does this line tell us what page we're looking at # ie: <p><font size=2 face="Arial,Helvetica"> # Displaying disc 1-10 of 2542 matching CDs</font></p> (/<p>.*?(Displaying disc .*? of .*? matching CDs).*?<\/p>/)&& +(push @result_list, $1); # is this start of list?: (/<!-- LIST START -->/) && ($list_started = 1) && (next); (/<!-- LIST END -->/) && ($list_ended); # save this list item into array: if($list_started && !$list_ended){ # a list item looks like this: # <LI type=circle><FONT face="Arial, Helvetica, sans-serif +"><B><!-- START ITEM --><!-- REL 100 ENDREL --><A HREF="/xm/pcd/genhi +phop/e004abd1bd74777c1ad8ec4088ec67c7.html" >The Beastie Boys / Hello + Nasty</A><!-- END ITEM --> </B></FONT> <br>&nbsp;&nbsp;&nbsp +;Just A <b>Test</b><BR> # strip out urls / album title/artist: /A HREF="(.*)" >(.*)<\/A>/; my $tmp="$1##$2"; # me being stupid and forgett +ing how to use hashes ;) push(@result_list, $tmp); } } # make sure the first item in @result_list is the 'Displaying disc + x of n matching CDs ($result_list[0] =~ /^Displaying disc/) || die("Unable to retrieve + paging info\n"); return @result_list; } sub choose_album(){ my $page_info=shift; my @album_url=@_; my $last_page; # print paging info: print $page_info,"\n"; for(my $i=0; $i < $page_count; $i++){ my (undef, $album) = split "##", $album_url[$i]; printf("%2s. %s\n", $i+1, $album); } if(scalar(@album_url) < $page_count ){ $page_count=@album_url; $last_page=1; } print "Select album (0, ..., $page_count)\n"; print "'q' to quit\n"; $last_page ? "" : print "Any other key for more...\n"; while(<STDIN>){ chomp; if(/^(\d+)$/){ get_track_listing($album_url[$1-1]); exit; } elsif(/[qQ]/) { exit; } else { # increment current item by $page_count: $page_curr=$page_curr+$page_count; my $query_url=$base_url.$search_uri.$query_list."&f=all&s= +$page_curr&n=$page_count"; # add $page_count onto $page_curr in URL # works ok but we want $page_curr globally accessible #$query_url=~s/&s=(\d+)&/"&s=".int($1+$page_count)."&"/e; $query_url=~s/&s=(\d+)&/&s=$page_curr&/; &main($query_url); } } } sub get_track_listing(){ my ($uri, $album_artist)= split "##", shift; my $url=$base_url.$uri; my ($artist, $album) = split " / ", $album_artist; my $outfile = $album." - ".$artist.".txt"; # fetch the page containing the track list: my $result = &get_url($url); # open the output file for printing track list to: open(OUTFILE, ">$outfile") || die("Unable to open $outfile for wri +ting\n"); if($result->is_success){ # we got the html page containing the track list ok, # parse out the track listing now. # track items look like this: # <LI><B><FONT face="Arial, Helvetica, sans-serif" >Super Disc +o Breakin&#39;</FONT></B><br> my @result_lines=split("\n", $result->content); foreach(@result_lines){ if( m#<LI><B><FONT.*? >(.*?)</FONT></B><br>#){ print OUTFILE $1,"\n"; print $1,"\n" if $debug; } } } else { die("Unable to retrieve $url\n"); } } sub usage{ die<<"EOT"; Usage: $progname [-h] [-d] [-n x] keyword1 ... keywordn Search/query the cddb.com website for CD-ROM listings including the search keywords keyword1 to keywordn. Invoked with argument '-h' prints this help. Invoked with argument '-d' prints debug info. Invoked with argument '-n x' prints x number of results per page. Max x == 50 (max number of 'hits' per page allowed by cddb.com). EOT } 1; __END__ =head1 NAME cddb_get_tracklist.pl - search for CD discs matching keywords entered +on command-line. =head1 SYNOPSIS cddb_get_tracklist.pl david holmes Fetch a list of all albums listed on cddb.com containing the words 'david holmes' in. Note this searches for occurences of 'david holmes'in any of album nam +e, artist or track titles. =head1 DESCRIPTION Fetches a list of albums from the CDDB website matching the search str +ing entered on the command line. An individual album can then be selected from this list so that the track listing for that album can be 'dumped' into a file in the curren +t directory. With additional arguments, the script will also vary the number of alb +um titles per page to display. =head1 README Author: Jez Hancock <jez.hancock@munkboxen.mine.nu> Date: 20020622113210 Modules used: LWP::UserAgent Notes: You may want to change the output file name format, I use 'album_title - artist.txt', which is good for me, but a lot of ppl don't like spaces in filenames... up to you... The code isn't that hot, and no doubt there are untold bugs... feel fr +ee to modify the code as you like, please just mail me if you do make any + considerable changes - nice to hear about offspring making it in the world ;) The code is liable to 'break' at such time that the fine folk at http: +//gracenote.com decide to change the search URL/URI format. This shouldn't be too har +d to fix and should just be a matter of finding out the new format and editing the strings + $base_url and $search_uri accordingly below. Wish list: To have the numbering fixed when a user 'pages' from one screen of res +ults to the next. Presently, first page will show result items numbered: '1 ... 10', sec +ond page will then show items numbered: '1 ... 10' also. This works ok, just an aestheti +c thing ;) This script is totally raw! I only hacked it up because I couldn't fi +nd it anywhere else (to my surprise). Hope others find it useful... if you do let me know +! Jez =head1 USAGE C<cddb_get_tracklist.pl [-h] [-d] [-n x] keyword1 ... keywordn> Search/query the cddb.com website for CD-ROM listings including the search keywords keyword1 to keywordn. Invoked with argument '-h' prints this help. Invoked with argument '-d' prints debug info. Invoked with argument '-n x' prints x number of results per page. max x == 50 (max number of 'hits' per page allowed by cddb.com). =head1 PREREQUISITES This script requires the C<LWP> module. =head1 AUTHOR Copyright 1998-2000, Jez Hancock <jez.hancock@munkboxen.mine.nu> All r +ights reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Address bug reports and comments to: jez.hancock@munkboxen.mine.nu =head1 BUGS HTML Character Entity References aren't translated into ascii equivale +nts (ie &amp; isn't translated into '@') Minimal paging, could be tweaked. =head1 SEE ALSO C<http://search.cpan.org/doc/DSHULTZ/Net-CDDBScan-2.01/CDDBScan.pm> Interesting looking PM I found only after authoring this hack. =head1 OSNAMES any =head1 SCRIPT CATEGORIES Audio/MP3 =cut

defined($nick{munk}) ? &eatfood :""; < http://munkboxen.mine.nu >

In reply to CDDB track listing search script by munk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.