http://qs1969.pair.com?node_id=804336

Mad_Mac has asked for the wisdom of the Perl Monks concerning the following question:

Oh wise PerlMonks,

I am working on a script to parse my mp3 files and write the id3 tag info to a database, mainly just for the exercise of doing it. My code is below. It works, but I'm pretty sure the way I am doing things is less efficient than it could be.

I'd love any suggestions on how to make this code run more quickly and efficiently.

BTW - if it matters, I am using ActivePerl on a Vista x64 box.

#!/usr/bin/perl use strict; use warnings; use MP3::Tag; use File::Find; use DBI; #set up sqlite connection my $db = DBI->connect("dbi:SQLite:music.sqlite","",""); #set up variables my $count = 0; my @songinfo = my $mp3 = my @dirs = ("/users/mine/music"); # Get rid of the previous version of the table $db->do("DROP TABLE IF EXISTS Songs") or DIE("Couldn't drop old table" +); # Recreate Table $db->do("CREATE TABLE songs (id INTEGER PRIMARY KEY AUTOINCREMENT, Tra +ck INTEGER, Title TEXT, Artist TEXT, Album TEXT, Year INTEGER, Genre +TEXT, Path TEXT)") or DIE("Couldn't create new table"); #set up SQL statement my $sth = $db->prepare('INSERT INTO Songs (Track, Title, Artist, Album +, Year, Genre, Path) VALUES (?, ?, ?, ?, ?, ?, ?)'); # get total number of mp3's my @songtotal = `dir c:\\users\\mine\\music\\*.mp3 /s`; my @mp3total = split(/ /, trim($songtotal[(scalar @songtotal)-2])); # look for files in each directory find(\&displayMP3Info, @dirs); # this function is called every time a file is found sub displayMP3Info { # if the file has an MP3 extension if (/\.mp3$/i) { # increment counter $count++; # create new MP3-Tag object $mp3 = MP3::Tag->new($_); # get tag information $mp3->get_tags(); # check to see if an ID3v1 tag exists if (exists $mp3->{ID3v1}) { #if it's an ID3V1 Tag stick the parts of it into an array +called songinfo @songinfo = ($mp3->{ID3v1}->track, $mp3->{ID3v1}->title, $ +mp3->{ID3v1}->artist, $mp3->{ID3v1}->album, $mp3->{ID3v1}->year, $mp3 +->{ID3v1}->genre, $File::Find::name); # put it in the database $sth->execute($songinfo[0], $songinfo[1], $songinfo[2], $s +onginfo[3], $songinfo[4], $songinfo[5], $songinfo[6]); # print "\n$songinfo[0], $songinfo[1], $songinfo[2], $song +info[3], $songinfo[4], $songinfo[5], $songinfo[6]\n"; # update the total on the screen printf STDERR ("\r%02d of $mp3total[0] files.", $count); } # clean up $mp3->close(); } } print "Done.\n"; # trim function to get rid of leading and trailing spaces sub trim{ my $string = shift; $string =~ s/^\s+|\s+$//g; return $string; }

Replies are listed 'Best First'.
Re: Improving SQL speed
by MidLifeXis (Monsignor) on Nov 01, 2009 at 13:18 UTC

    Not necessarily speed tips, but some style thing.

    • Kudos on using prepare / execute.
    • Instead of ->execute($var[0], $var[1], ...), why not just use ->execute(@var); or ->execute(@var[0..6]);?
    • Unless I am missing something elsewhere, I don't see what DIE is. Did you perhaps mean die?
    • Look into the use of my and scoping your variables in as small of a scope as needed. $mp3 as an example should be declared at the point where the new() call is made (eg: my $mp3=MP3::Tag->new(..)).
    • While the if (/\.mp3$/i) { is ok, you can avoid a level of nesting by changing the enclosing if block to return unless /\.mp3$/i;. My statement becomes invalid if you wish to do something else if it does not end in mp3.
    • The section where you declare @songinfo, $mp3, and @dirs is wonky. They are mixed types, and I don't think that it does what you think it does (but I could be wrong ;-)

    --MidLifeXis

      Thanks. Appreciate the tips.

      OK, so I worked through your suggestions:

    • Thanks
    • Got it. I was trying to be explicit in the debugging phase, but now that it's working I changed it per your suggestion.
    • OK. I had them declared early on becuase I thought I was getting an error declaring them within the sub-function, but apparently not. I changed them as you recommended and it works fine.
    • Thanks! Can the same trick be used on if (exists $mp3->{ID3v1}) {?
    • Yea, Part of that is because I kept changing the way I wanted to use those variables as my code evolved ... part of it's because I've only been working in Perl again for a few weeks after a 10+ year hiatus, and part of it's becuase I occasionally get warnings from Perl that calling certain array variables like @songinfo[0] would be better referenced as $songinfo[0]. Which makes me think I don't really understand Perl variables yet. So I just keep mucking with it (and plagiarizing other peoples code) 'till it does what I want.
    • After making the changes you suggested, the code seems to be ~10% faster, but I'm not sure. I'm going to add a timer and run it the old way and with your changes. We'll see.

      Thanks again for your suggestions.

      Mad Mac

        Any speed improvement is unintentional. My suggestions were purely for maintenance and readability (speaking Perl, in effect). If it is faster, I take cookies ;-)

        --MidLifeXis

Re: Improving SQL speed
by hesco (Deacon) on Nov 01, 2009 at 15:17 UTC
    Your title and question suggested to me a need for appropriate table indexing to improve database performance. But your code shows only a single INSERT on an unindexed table. An index slows down table INSERTs (but is often justifiable because of the needs of future queries on the data set). It is also faster than DROPping and rebuilding those indexes every time you do an INSERT, except for bulk INSERTs when that strategy can make sense.

    If you are having some performance issues with this code, I doubt it is in the database interactions.

    I'd suggest using Devel::NYTProf to start to get a handle on where the resources are being used. I'd guess that parsing your mp3 library is not a trivial task.

    -- Hugh

    if( $lal && $lol ) { $life++; }
    if( $insurance->rationing() ) { $people->die(); }

      Thanks. I'll look into that module. I'm not having performance issues per se, but the code seems to parse ~15-20 songs per second, which is ~20 minutes for the 19k file in my mp3 library. Not terrible, but I have a feeling it can be faster. I don't notice an CPU or RAM hit while it's running, but I haven't looked closely.

      Mad_Mac
        Just to compare speed: I have an MP3 library on network attached storage which I access through WiFi, so it is not terribly fast. It contains 5900 MP3-songs in 500 folders and is 35 GB large. Indexing it with MP3::Find and MP3::Find::DB (using an SQLite database) took 3022 seconds, so it is almost 2 seconds per song songs per second. I am sure if the songs are on a local harddisk, the speed would be higher. The processor never maxed-out, so the bottle-neck is most probably in the network and/or harddisk.

        Given these circumstances it seems therefore you have indeed some room to improve the speed of your script.

        Update: fixed typo.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Improving SQL speed
by gmargo (Hermit) on Nov 01, 2009 at 16:40 UTC

    One alternative is to process all the mp3 files, and then write all the data simultaneously in one call to execute_array, instead of one call to execute for each mp3 file. It's more work to set up (must accumulate all data in separate arrays and use bind_param_array) but for a large number of entries it will probably be faster.

      And if ganging up inserts still seems too slow, there's apparently a bulk load command named ".import" that will load a table from a file in one-shot:

      The fact that there's a ".seperator" command makes it sound like it's better suited to tsv files than csv, but you'd need to play around with it to see. Most database bulk-loaders seem to work with some version of csv, but frequently you'll need to massage the csv first to get it in the form they expect: csv is a standard without a standard.

        Thanks. I'd had skimmed some other pages that suggested for this type of thing it might be faster to write everything out to a CSV and the do a bulk upload, but I hadn't found a good example of the necessary syntax yet. I'll check out those links.
Re: Improving SQL speed
by bart (Canon) on Nov 01, 2009 at 22:19 UTC
    You may look into using transactions, and only commit every N entries, or every M seconds, whichever is faster. You may find that slowness is largely caused by (implicit or explicit) commit for every insert.
Re: Improving SQL speed
by oha (Friar) on Nov 02, 2009 at 10:22 UTC
    as a general rule, you should try to find-out if your script is CPU bound (so there is some algorithms to be checked) or IO bound (waiting for DB to finish or waiting for the HDD). Just check the CPU load or use NYTProf.

    Also it seems to me a very scalable task, IOW you can run the script which find all the files to be managed and split this list in lists; then fork as many times for every list and process the data. This will solve latency issues to the db and speed-up the other cases also.

Re: Improving SQL speed
by CountZero (Bishop) on Nov 02, 2009 at 21:51 UTC
    Have a look at MP3::Find and MP3::Find::DB, it might give you some good ideas.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Sweet!

      Thanks