Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

I am taking a break from Perl6 to help solve a problem for a relative. They have a huge library of music CDs and find it difficult to remember what they already have when looking to buy new stuff. I was hoping there was something equivalent to ISBN for books so that I could automate most of the process of building a database. Using the UPC, I was able to come up with the following very rough proof of concept:
#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::Strip; use HTML::TableContentParser; my $upc = $ARGV[0] || die 'UPC required'; my $url= "$upc.html"; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get($url); my %meta; my $table = HTML::TableContentParser->new()->parse($mech->content); my $data = HTML::Strip->new()->parse($table->[7]{rows}[0]{cells}[1]{da +ta}); $data =~ s/\n\s+//; for my $detail ( split /\n/, $data ) { my ($cat, $info) = split /\s*:\s*/, $detail, 2; $meta{$cat} = $info; } my $song = join '', map { HTML::Strip->new()->parse($_->{data}) } @{$table->[9]{rows}[0]{cells}}; $song =~ s/\n\s*/ /g; push @{$meta{track}}, $_ for grep $_, split /\s+\d+\.\s+/, $song; use Data::Dumper; print Dumper( \%meta );
Before I invest any time into this, I was wondering if anyone else was familiar with a better pre-existing wheel? I spent 2 minutes looking at Net::Amazon::UPC, but I couldn't find any info on the CD in my hand with it. Perhaps I just don't know how to use the module as it seems to me that this is a problem someone else would have already solved. If not, I would be happy to put some real development time into and share it publicly. Your thoughts?

Cheers - L~R

Replies are listed 'Best First'.
Re: Music CD Data
by brian_d_foy (Abbot) on May 29, 2005 at 18:48 UTC

    Not all CDs have UPCs, and even Amazon doesn't use those to identify their products (they use an ASN). The CDDB uses the length of the tracks to guess which CD you have, but sometimes that even comes up with more than one possibility. There are a couple of Perl modules for that too, but you have to come up with the CD info. A combination of those should identity a disk in most of the cases however.

    I don't pay much attention to either of those: I put the CDs into the computer, let the music palyer figure it out (using CDDB), then look at the stuff the music player stores. For some of them, you don't even need to rip the disk: it remembers what it has already looked up. :)

    brian d foy <>
      Thanks, but I am afraid your approach is too advanced. This particular relative is not IT savvy by any stretch of the imagination. This is more a matter of being able to answer a handful of questions:
      • Do I already own CD X?
      • What CD is song X on?
      • What CDs by artist X do I own?
      • Who sings song X?
      I plan on making an extremely simple web interface and so far the code above seems to be doing the trick where as other wheels don't. Personally, I don't have a lot of music to worry about - I listen to web broadcast radio stations.

      Cheers - L~R


        I remember seeing a perl module called CD::Info and the one on cpan, called Audio::CD, which can grab CDDB info.

        These might help, if you can stick each CD in your drive.

        Walking the road to enlightenment... I found a penguin and a camel on the way.....
        Fancy a Just ask!!!

        Your task is to get the data, and that's what I answered. At some point you have to get the data from the stack of CDs. Taht could be you typing or scanning a lot of barcodes, or sticking the CD in the drive for a couple of seconds. You can make your own judgement about which you want to do, but then, you asked about populating the database, not the end user interface.

        I've already done this for myself and a few other people. I let the installed music player look at the CDDB for all the CDs then take the data it stored on all of those CDs for the database. I don't have to write a lot of code that way.

        I'm sure there a freeware or shareware program out there that does what you want already, though :)

        brian d foy <>
      I agree on using CDDB, as all you have to do is insert the CD into the drive, let the system do the lookup, and you get more info than you even asked for, such as all the track titles.

      The disadvantage is IMO in limitations in CDDB. There can only be one entry per signature per genre in the central database, and I already have encountered cases where different CDs have the same signature. This is is most likely to occur for 2 track CD singles.

      CDDB is in need for a serious overhaul.

Re: Music CD Data
by Camel_thirst (Friar) on May 29, 2005 at 18:46 UTC
    Maybe I misunderstood your indexation task but why not use CDDB index keys? CDDBScan cheers,
      The idea is to get as much information entered into a database with the least amount of input. Unless I am missing something, your suggestion doesn't help in this specific task. You sit with a stack of CDs and enter a short "tag" (in this case UPC) and the code does the rest. Thanks for the suggestion though.

      Cheers - L~R

Re: Music CD Data
by TilRMan (Friar) on May 30, 2005 at 05:08 UTC publishes their entire database for download. You could slurp that into a local database and have the user enter a bit of information, like the artist. Then have the form display a list of possible matches.

    For (user) efficiency, you probably want to accept multiple CDs at once. E.g., the user would grab a stack of ten CDs, punch in all ten artists (or fewer with multiple CDs by the same artist), and hit submit. Then give a list of all possible matches, and a checkbox next to each, plus an "it isn't here" button.

    Or you could take the opposite approach and have a big, alphabetized list of all of the CDs. You'd probably need to partition it into several pages. If the collection is already alphabetized, this approach could be a big win. Just check the boxes next to the ones you've got, hit submit, and you're presented with the next page.

Re: Music CD Data
by jhourcle (Prior) on May 29, 2005 at 23:18 UTC

    It's not Perl, or even an open source solution, but there's a product for MacOS called Delicious Library, that I've heard good things about for tracking dvds/music/books/etc. (I've never used it myself).

    They use the UPC for tracking items, but I don't know how they've built up their database. You could always start your own -- remember, CDDB and IMDB were pretty bad at the beginning... it took some time for people to get enough info in there for them to become useful. For music, you might be able to seed your database from FreeDB (the open fork of the CDDB database).

Re: Music CD Data
by aufflick (Deacon) on May 30, 2005 at 02:45 UTC
    CD management is always tricky. CDDB really helps, and you just need a simple automated system (read cddb, eject disk, repeat) + some kids/nephews willing to while away a weekend inserting disks into a cd drive in exchange for an agree $rate per cd ;)
Re: Music CD Data
by salva (Canon) on May 30, 2005 at 09:28 UTC
Re: Music CD Data
by astroboy (Chaplain) on May 30, 2005 at 07:33 UTC
    A scanner I bought on an auction site arrived today, and I've been playing with scanning my book and CD barcodes and getting the details off Amazon. The following code seems to work on about 50% of the CDs I've tried:
    use Net::Amazon; my $ua = Net::Amazon->new( token => '<your amazon id>' ); while ( my $upc = <> ) { my $response = $ua->search( upc => $upc, mode => 'music' ); if ( $response->is_success() ) { print $response->as_string(), "\n"; } else { print "Error: ", $response->message(), "\n"; } }
Re: Music CD Data
by DrHyde (Prior) on Jun 01, 2005 at 09:32 UTC
    Sounds to me like you want to spend a little bit of money and get a copy of Readerware and a barcode scanner. The nice Readerware people used to give away free barcode scanners, they might still be doing that. It can print out your catalogue, and there's also a read-only application for Palm Pilots so you can carry your catalogue around with you in electronic form.

    Despite using Mac OS X, I prefer Readerware to Delicious Library, because Readerware is cross-platform. I've tested it on Doze and Linux/x86. Being written in Java, I imagine that it would work on devices like some phobile moans, or on a Zaurus.

Re: Music CD Data
by CountZero (Bishop) on Jun 02, 2005 at 04:08 UTC
    I looked in CPAN and to my surprise none of the CD catalogue-ing modules support Windows!

    So I wrote a script myself that reads the data from the cd in your cd-drive and searches for it at the database. Just put a CD in your drive and run this script. It wouldn't be too difficult to put the data it finds in your own database. SQLite seems a good candidate for such a thing.


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law