Re: Memory Overflow

I suspect that the $dbh->fetchrow_array() statements may be causing you issue, also you had a very significant syntax error that would have led to $h_title_counter ALWAYS being set to 1. Vis:

my @array = ( 'single result value' );
my $wrong = @array;     # SCALAR CONTEXT
my ($right) = @array;   # ARRAY CONTEXT
print "Wrong: $wrong\nRight: $right\n";
__DATA__
Wrong: 1
Right: single result value
[download]

I have cleaned up the code a little, moved stuff out of the loop that does not need to be there and added a bind case and dropped the $line declaration and just use $_ as you only use it once, and Perl has lazy GC. $jump will always equal $. (line count - so that is one less var)

  # we don't need this in the loop as we know when we increment (unles
+s forking)
  # also syntax IS WRONG in original - need array context on LHS to ge
+t val
  # otherwise you are always going to get the val 1
  ($h_title_counter) = $dbh->selectrow_array("SELECT MAX(id) FROM head
+_index");

  # this should be a sth with bind not $dbh->selectrow_array every tim
+e
  my $sthX = $dbh->prepare("SELECT 1 FROM e_match_word WHERE word = ?"
+);

  LONG:
    while (<READ>) {
          next if $. <= $rows2jump;
            $counter2++; #number of lines processed           
            #Lets deal with the sequences head here, marked by > so th
+is will
            # only exec if we have that marker
            if ( s/^>// ) {
                $sth1->execute($_) or print "$DBI::errstr";
                $h_title_counter++;
              next LONG;
            }

            #Here we will deal with the actal sequence
            chop; #To remove the > at the end of the line
            #Input stuff in mysql
             for my $x (0 ... (length($line) - ($word_length)-1)) {
                my $word = substr($_, $x, $word_length);
                #Now we have it with a big cache, hope to speed things
+ up
                if ($most_used_words{$word}) {
                    #UPDATE e_match_word set counter = counter+1 WHERE
+ word = ?
                    $sth3->execute($word) or print "$DBI::errstr";
                    #print "cached result\n";
                    
                } elsif ( $sthX->execute($word)) {
                    #UPDATE e_match_word set counter = counter+1 WHERE
+ word = ?
                    $sth3->execute($word) or print "$DBI::errstr";
                    #print "not cached result\n";
                    
                } else {                    
                    #INSERT INTO e_match_word VALUES (?,?)
                    $sth2->execute($word, 1) or print "$DBI::errstr";
                    $wordcounter++;
                    #print "new word\n";                    
                }           
                #INSERT INTO e_match_info values (?,?,?)
                $sth4->execute($word, $h_title_counter, $x) or print "
+$DBI::errstr";
            }            
            #$percent_done = int(($counter2/$counter) * 100);
            # you are setting status but never use it......
            #$status = "Searching for exact patterns.. processing line
+ $counter2 of $counter"; 
            $mw->update();
            # $mw->update;      <--- duplicate
            # exit if ($stop);  <--- never set in loop
    }
    close (READ);
[download]

Now the other issue you have is runtime. If you are say using MySQL with default settings you may error out with a MySQL server has gone away error unless you are using all you prepared STHs regularly ie within timeout (8 hours default). Probably not an issue but it would be a bit annoying if this died 6.9 days into a run.

Finally I can save you another database query. You can do this (pseudo code)

# currently
if ( DB query word exists in table ) {
    DB do update counter
}
else
    DB insert word with counter val of 1
}

# this will always use 2 DB queries per iter, whereas this code.....

unless ( execute(UPDATE table SET counter = counter + 1 WHERE word = ?
+) ) {
    # this should only fail if the bound word does not exist so 
    # know we know we really needed an INSERT with default value (1)
    # but we have saved 1 DB query - IN A LOOP
    execute(INSERT INTO table VALUES ( word, 1 ) )
}

# will only use one query if it is an update (probably the most common
+ case)
# and still only uses 2 for an insert.
# this can yield savings -> 50% depending on Update:Insert ratio
[download]

The logic is that if we need the insert the update will fail. If not the update succeeds. Thus we save directly verifying if 'word' is in the table or not.

cheers

tachyon

Comment on Re: Memory Overflow Select or Download Code

Replies are listed 'Best First'.
Re: Re: Memory Overflow by Anonymous Monk on Jan 06, 2004 at 07:36 UTC
my ($right) = @array; # ARRAY CONTEXT ? I think you mean LIST CONTEXT.	[reply]
Re: Re: Re: Memory Overflow by tachyon (Chancellor) on Jan 06, 2004 at 10:01 UTC
I see your quible and raise you..... cheers tachyon	[reply]