Improving SQL speed

Mad_Mac has asked for the wisdom of the Perl Monks concerning the following question:

Oh wise PerlMonks,

I am working on a script to parse my mp3 files and write the id3 tag info to a database, mainly just for the exercise of doing it. My code is below. It works, but I'm pretty sure the way I am doing things is less efficient than it could be.

I'd love any suggestions on how to make this code run more quickly and efficiently.

BTW - if it matters, I am using ActivePerl on a Vista x64 box.

#!/usr/bin/perl
use strict;
use warnings;
use MP3::Tag;
use File::Find;
use DBI;

#set up sqlite connection
my $db = DBI->connect("dbi:SQLite:music.sqlite","","");

#set up variables
my $count = 0;
my @songinfo = 
my $mp3 =
my @dirs = ("/users/mine/music");

# Get rid of the previous version of the table
$db->do("DROP TABLE IF EXISTS Songs") or DIE("Couldn't drop old table"
+);
# Recreate Table
$db->do("CREATE TABLE songs (id INTEGER PRIMARY KEY AUTOINCREMENT, Tra
+ck INTEGER, Title TEXT, Artist TEXT, Album TEXT, Year INTEGER, Genre 
+TEXT, Path TEXT)") or DIE("Couldn't create new table");
#set up SQL statement
my $sth = $db->prepare('INSERT INTO Songs (Track, Title, Artist, Album
+, Year, Genre, Path) VALUES  (?, ?, ?, ?, ?, ?, ?)');

# get total number of mp3's
my @songtotal = `dir c:\\users\\mine\\music\\*.mp3 /s`;
my @mp3total = split(/ /, trim($songtotal[(scalar @songtotal)-2]));

# look for files in each directory
find(\&displayMP3Info, @dirs);

# this function is called every time a file is found 
sub displayMP3Info {
    # if the file has an MP3 extension
    if (/\.mp3$/i) {
        # increment counter
        $count++;
        # create new MP3-Tag object
        $mp3 = MP3::Tag->new($_);
        # get tag information
        $mp3->get_tags();
        # check to see if an ID3v1 tag exists
        if (exists $mp3->{ID3v1})
        {
            #if it's an ID3V1 Tag stick the parts of it into an array 
+called songinfo
            @songinfo = ($mp3->{ID3v1}->track, $mp3->{ID3v1}->title, $
+mp3->{ID3v1}->artist, $mp3->{ID3v1}->album, $mp3->{ID3v1}->year, $mp3
+->{ID3v1}->genre, $File::Find::name);
            # put it in the database
            $sth->execute($songinfo[0], $songinfo[1], $songinfo[2], $s
+onginfo[3], $songinfo[4], $songinfo[5], $songinfo[6]); 
            # print "\n$songinfo[0], $songinfo[1], $songinfo[2], $song
+info[3], $songinfo[4], $songinfo[5], $songinfo[6]\n";
            # update the total on the screen
            printf STDERR ("\r%02d of $mp3total[0] files.", $count);
        }
        # clean up
        $mp3->close();
    }    
}
print "Done.\n";
# trim function to get rid of leading and trailing spaces
sub trim{
   my $string = shift;
   $string =~ s/^\s+|\s+$//g;
   return $string;
}
[download]

Comment on Improving SQL speed Download Code

Replies are listed 'Best First'.
Re: Improving SQL speed by MidLifeXis (Monsignor) on Nov 01, 2009 at 13:18 UTC
Not necessarily speed tips, but some style thing. Kudos on using `prepare` / `execute`. Instead of `->execute($var[0], $var[1], ...)`, why not just use `->execute(@var);` or `->execute(@var[0..6]);`? Unless I am missing something elsewhere, I don't see what `DIE` is. Did you perhaps mean die? Look into the use of my and scoping your variables in as small of a scope as needed. `$mp3` as an example should be declared at the point where the `new()` call is made (eg: `my $mp3=MP3::Tag->new(..)`). While the `if (/\.mp3$/i) {` is ok, you can avoid a level of nesting by changing the enclosing `if` block to `return unless /\.mp3$/i;`. My statement becomes invalid if you wish to do something else if it does not end in mp3. The section where you declare @songinfo, $mp3, and @dirs is wonky. They are mixed types, and I don't think that it does what you think it does (but I could be wrong ;-) --MidLifeXis	[reply] [d/l] [select]
Re^2: Improving SQL speed by Mad_Mac (Beadle) on Nov 02, 2009 at 08:22 UTC
Thanks. Appreciate the tips.	[reply]
Re^2: Improving SQL speed by Mad_Mac (Beadle) on Nov 02, 2009 at 18:57 UTC
OK, so I worked through your suggestions: Thanks Got it. I was trying to be explicit in the debugging phase, but now that it's working I changed it per your suggestion. OK. I had them declared early on becuase I thought I was getting an error declaring them within the sub-function, but apparently not. I changed them as you recommended and it works fine. Thanks! Can the same trick be used on `if (exists $mp3->{ID3v1}) {`? Yea, Part of that is because I kept changing the way I wanted to use those variables as my code evolved ... part of it's because I've only been working in Perl again for a few weeks after a 10+ year hiatus, and part of it's becuase I occasionally get warnings from Perl that calling certain array variables like `@songinfo[0]` would be better referenced as `$songinfo[0]`. Which makes me think I don't really understand Perl variables yet. So I just keep mucking with it (and plagiarizing other peoples code) 'till it does what I want. After making the changes you suggested, the code seems to be ~10% faster, but I'm not sure. I'm going to add a timer and run it the old way and with your changes. We'll see. Thanks again for your suggestions. Mad Mac	[reply] [d/l] [select]
Re^3: Improving SQL speed by MidLifeXis (Monsignor) on Nov 02, 2009 at 21:03 UTC
Any speed improvement is unintentional. My suggestions were purely for maintenance and readability (speaking Perl, in effect). If it is faster, I take cookies ;-) --MidLifeXis	[reply]
Re: Improving SQL speed by hesco (Deacon) on Nov 01, 2009 at 15:17 UTC
Your title and question suggested to me a need for appropriate table indexing to improve database performance. But your code shows only a single INSERT on an unindexed table. An index slows down table INSERTs (but is often justifiable because of the needs of future queries on the data set). It is also faster than DROPping and rebuilding those indexes every time you do an INSERT, except for bulk INSERTs when that strategy can make sense. If you are having some performance issues with this code, I doubt it is in the database interactions. I'd suggest using Devel::NYTProf to start to get a handle on where the resources are being used. I'd guess that parsing your mp3 library is not a trivial task. -- Hugh if( $lal && $lol ) { $life++; } if( $insurance->rationing() ) { $people->die(); }	[reply]
Re^2: Improving SQL speed by Mad_Mac (Beadle) on Nov 02, 2009 at 19:02 UTC
Thanks. I'll look into that module. I'm not having performance issues per se, but the code seems to parse ~15-20 songs per second, which is ~20 minutes for the 19k file in my mp3 library. Not terrible, but I have a feeling it can be faster. I don't notice an CPU or RAM hit while it's running, but I haven't looked closely. Mad_Mac	[reply]
Re^3: Improving SQL speed by CountZero (Bishop) on Nov 03, 2009 at 06:59 UTC
Just to compare speed: I have an MP3 library on network attached storage which I access through WiFi, so it is not terribly fast. It contains 5900 MP3-songs in 500 folders and is 35 GB large. Indexing it with MP3::Find and MP3::Find::DB (using an SQLite database) took 3022 seconds, so it is almost 2 ~~seconds per song~~ songs per second. I am sure if the songs are on a local harddisk, the speed would be higher. The processor never maxed-out, so the bottle-neck is most probably in the network and/or harddisk. Given these circumstances it seems therefore you have indeed some room to improve the speed of your script. Update: fixed typo. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re: Improving SQL speed by gmargo (Hermit) on Nov 01, 2009 at 16:40 UTC
One alternative is to process all the mp3 files, and then write all the data simultaneously in one call to `execute_array`, instead of one call to `execute` for each mp3 file. It's more work to set up (must accumulate all data in separate arrays and use `bind_param_array`) but for a large number of entries it will probably be faster.	[reply] [d/l] [select]
Re^2: Improving SQL speed by doom (Deacon) on Nov 01, 2009 at 17:59 UTC
And if ganging up inserts still seems too slow, there's apparently a bulk load command named ".import" that will load a table from a file in one-shot: http://programmingfaq.w3ec.com/faq/54289/bulk-load-data-into-sqlite http://www.sqlite.org/sqlite.html The fact that there's a ".seperator" command makes it sound like it's better suited to tsv files than csv, but you'd need to play around with it to see. Most database bulk-loaders seem to work with some version of csv, but frequently you'll need to massage the csv first to get it in the form they expect: csv is a standard without a standard.	[reply]
Re^3: Improving SQL speed by Mad_Mac (Beadle) on Nov 02, 2009 at 19:05 UTC
Thanks. I'd had skimmed some other pages that suggested for this type of thing it might be faster to write everything out to a CSV and the do a bulk upload, but I hadn't found a good example of the necessary syntax yet. I'll check out those links.	[reply]
Re: Improving SQL speed by bart (Canon) on Nov 01, 2009 at 22:19 UTC
You may look into using transactions, and only `commit` every N entries, or every M seconds, whichever is faster. You may find that slowness is largely caused by (implicit or explicit) `commit` for every insert.	[reply] [d/l] [select]
Re: Improving SQL speed by oha (Friar) on Nov 02, 2009 at 10:22 UTC
as a general rule, you should try to find-out if your script is CPU bound (so there is some algorithms to be checked) or IO bound (waiting for DB to finish or waiting for the HDD). Just check the CPU load or use NYTProf. Also it seems to me a very scalable task, IOW you can run the script which find all the files to be managed and split this list in lists; then fork as many times for every list and process the data. This will solve latency issues to the db and speed-up the other cases also.	[reply]
Re: Improving SQL speed by CountZero (Bishop) on Nov 02, 2009 at 21:51 UTC
Have a look at MP3::Find and MP3::Find::DB, it might give you some good ideas. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re^2: Improving SQL speed by Mad_Mac (Beadle) on Nov 06, 2009 at 13:12 UTC
Sweet! Thanks	[reply]

Back to Seekers of Perl Wisdom