Win has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Please could somebody suggest a way of improving the efficiency of this bit of code. One idea that I have is to sort the @Geo_areas array and then in the SQL statement use 'between the first and last element of the sorted array'. That would work in the context of the task. Could I execute a block of SQL statements? Would that be more efficient? Anyway perhaps I am being a bit premature in asking this question, as the code does not seem to be running as expected anyway. It is running at the moment but is not printing to an output file.
foreach (@Geo_areas) { $_ or next; my $Return_results = "Select * from Result_storage_keep where Ag +gregated_area LIKE '".$_."' \;"; my $sth_C = $dbh->prepare($Return_results) or die "Couldn't prep +are query: ".$dbh->errstr; $sth_C->execute() or die "Couldn't execute query: ".$sth_C->errs +tr; while (my @row = $sth_C->fetchrow_array) { foreach (@row) { $_ = '' unless defined; } print OUTPUT_FILE join("\t", @row); print OUTPUT_FILE "\n"; undef @row; } }

Replies are listed 'Best First'.
Re: while (my @row = $sth->fetchrow_array)
by dragonchild (Archbishop) on Jan 09, 2006 at 17:57 UTC
    I'll bet that if you benchmark, 99% of the runtime is spent in the database. It almost always is.

    As for why it's slow ... the reasons are very simple and boil down to "programmers don't understand databases". I assume you're using MySQL, but the points are applicable to most other RDBMSes.

    1. I'll bet you don't have an index on the aggregated_area column.
    2. I'll bet that column is TEXT, meaning that even if you are indexing it, you're indexing it improperly. You have to index the first N characters (up to 255 for MySQL).
    3. If you're using LIKE to find substrings, then you'd be better off breaking that column out into a separate table and indexing that table.
    4. If they do have % or _ signs and you have them as the first character, then indexes are completely ignored. (They have to be, if you think about it.)

    The bottom line is LIKE has extremely poor performance characteristics and should be avoided whenever possible. If you're using LIKE, you had better have exhausted all other possibilities, like normalization.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: while (my @row = $sth->fetchrow_array)
by jettero (Monsignor) on Jan 09, 2006 at 14:50 UTC

    First, the whole purpose of prepare/execute is so the sql api only has to parse the statement once -- well, that and the security concerns... So, I moved that up before the foreach().

    map {} () is awesome.... try to get the hang of that. They suggest using as few print statements as possible, so I compounded that. I removed the undef @row, as I believe it's implied at the end of the block.

    Lastly, if nothing is turning up in your output file, check your open() to see if you have the ">" in the filename.

    Why did I rename all those default vars? Take it from someone that's been a perl nerd for a long time... it's better to name your perl vars longwindedly!

    Meh, here's your snipped the way I might do it:

    my $sth_C = $dbh->prepare( "select * from result_storage_keep where aggregated_area like ?") or die "Couldn't prepare query: " . $dbh->errstr; foreach my $area (@Geo_areas) { next unless $area; $sth_C->execute($area) or die "Couldn't execute query: " . $sth_C->errstr; while (my @row = $sth_C->fetchrow_array) { print OUTPUT_FILE join("\t", map {$_ ? $_ : "''"} @row), "\n"; } }

    Also, since you seem to be printing it all to the same file anyway, why not shorten it even more and do something like this:

    # this only works if this array is small-ish my $areas = join(", ", @Geo_areas); # interpolating can be dumb ... # unless you trust the data in @Geo_areas... my $sth_C = $dbh->prepare( "select * from result_storage_keep where aggregated_area in ($areas) order by aggregated_area") or die "Couldn't prepare query: " . $dbh->errstr; while (my @row = $sth_C->fetchrow_array) { print OUTPUT_FILE join("\t", map {$_ ? $_ : "''"} @row), "\n"; }
Re: while (my @row = $sth->fetchrow_array)
by pajout (Curate) on Jan 09, 2006 at 13:10 UTC
    Just some hints:
    1/ check if RaiseError or PrintError attributes are set when db connection established - if no, your dbh related commands could quietly fail
    2/ check if the query really returns some rows - run this query on some sql-console directly.
    3/ This cannot solve your problem, but it it better strategy, imho: Use '?' in queries and bind parameters. I think that
    '..._area LIKE ?;'
    is more readable and avoids potentiality of SQL-injection
Re: while (my @row = $sth->fetchrow_array)
by jesuashok (Curate) on Jan 09, 2006 at 11:10 UTC
    Hi

    You can use order by in the select query so that the Implicit sorting is done in the select query itself.

    "Keep pouring your ideas"
Re: while (my @row = $sth->fetchrow_array)
by jesuashok (Curate) on Jan 09, 2006 at 11:16 UTC
    Please do the following changes:-

    foreach (@Geo_areas) { $_ or next; my $Return_results = "Select * from Result_storage_keep where Ag +gregated_area LIKE '".$_."' \;"; my $sth_C = $dbh->prepare($Return_results) or die "Couldn't prep +are query: ".$dbh->errstr; $sth_C->execute() or die "Couldn't execute query: ".$sth_C->errs +tr; while (my @row = $sth_C->fetchrow_array) { foreach my $val (@row) { $val = '' unless defined; } print OUTPUT_FILE join("\t", @row); print OUTPUT_FILE "\n"; undef @row; } }
    Don'y Use $_ for a list in the same loop for two times.

    "Keep pouring your ideas"
      Using a named variable on foreach loops is often a good idea from a readability standpoint, but the parent reply makes it sound like it will change the behavior. This is not true. foreach automatically localizes the index variable if it is a global. See perlsyn.
Re: while (my @row = $sth->fetchrow_array)
by SamCG (Hermit) on Jan 10, 2006 at 17:33 UTC
    I'd concur with dragonchild, and add that jesuashok's first suggestion, ordering within the sql output, may have issues but has some sense behind it.

    You're going to the database separately for each @Geo_area. Maybe you're just picking out some (and not all) geo_areas, but perhaps then it makes more sense to do your filtering of these rows with a regex after you've done your select statement? Whether this makes sense probably depends on the number of rows being returned and the number of geographic areas (well, I'm guessing that's what they are) being investigated.

    As for it not printing to an output file, check silly things like that the output file is open for writing, and that your query is actually returning any records. I notice you don't have any % signs bracketing your LIKE statement. Is this in the data, or is the LIKE really functioning as an =?
A reply falls below the community's threshold of quality. You may see it by logging in.