in reply to Memory Overflow
I suspect that the $dbh->fetchrow_array() statements may be causing you issue, also you had a very significant syntax error that would have led to $h_title_counter ALWAYS being set to 1. Vis:
my @array = ( 'single result value' ); my $wrong = @array; # SCALAR CONTEXT my ($right) = @array; # ARRAY CONTEXT print "Wrong: $wrong\nRight: $right\n"; __DATA__ Wrong: 1 Right: single result value
I have cleaned up the code a little, moved stuff out of the loop that does not need to be there and added a bind case and dropped the $line declaration and just use $_ as you only use it once, and Perl has lazy GC. $jump will always equal $. (line count - so that is one less var)
# we don't need this in the loop as we know when we increment (unles +s forking) # also syntax IS WRONG in original - need array context on LHS to ge +t val # otherwise you are always going to get the val 1 ($h_title_counter) = $dbh->selectrow_array("SELECT MAX(id) FROM head +_index"); # this should be a sth with bind not $dbh->selectrow_array every tim +e my $sthX = $dbh->prepare("SELECT 1 FROM e_match_word WHERE word = ?" +); LONG: while (<READ>) { next if $. <= $rows2jump; $counter2++; #number of lines processed #Lets deal with the sequences head here, marked by > so th +is will # only exec if we have that marker if ( s/^>// ) { $sth1->execute($_) or print "$DBI::errstr"; $h_title_counter++; next LONG; } #Here we will deal with the actal sequence chop; #To remove the > at the end of the line #Input stuff in mysql for my $x (0 ... (length($line) - ($word_length)-1)) { my $word = substr($_, $x, $word_length); #Now we have it with a big cache, hope to speed things + up if ($most_used_words{$word}) { #UPDATE e_match_word set counter = counter+1 WHERE + word = ? $sth3->execute($word) or print "$DBI::errstr"; #print "cached result\n"; } elsif ( $sthX->execute($word)) { #UPDATE e_match_word set counter = counter+1 WHERE + word = ? $sth3->execute($word) or print "$DBI::errstr"; #print "not cached result\n"; } else { #INSERT INTO e_match_word VALUES (?,?) $sth2->execute($word, 1) or print "$DBI::errstr"; $wordcounter++; #print "new word\n"; } #INSERT INTO e_match_info values (?,?,?) $sth4->execute($word, $h_title_counter, $x) or print " +$DBI::errstr"; } #$percent_done = int(($counter2/$counter) * 100); # you are setting status but never use it...... #$status = "Searching for exact patterns.. processing line + $counter2 of $counter"; $mw->update(); # $mw->update; <--- duplicate # exit if ($stop); <--- never set in loop } close (READ);
Now the other issue you have is runtime. If you are say using MySQL with default settings you may error out with a MySQL server has gone away error unless you are using all you prepared STHs regularly ie within timeout (8 hours default). Probably not an issue but it would be a bit annoying if this died 6.9 days into a run.
Finally I can save you another database query. You can do this (pseudo code)
# currently if ( DB query word exists in table ) { DB do update counter } else DB insert word with counter val of 1 } # this will always use 2 DB queries per iter, whereas this code..... unless ( execute(UPDATE table SET counter = counter + 1 WHERE word = ? +) ) { # this should only fail if the bound word does not exist so # know we know we really needed an INSERT with default value (1) # but we have saved 1 DB query - IN A LOOP execute(INSERT INTO table VALUES ( word, 1 ) ) } # will only use one query if it is an update (probably the most common + case) # and still only uses 2 for an insert. # this can yield savings -> 50% depending on Update:Insert ratio
The logic is that if we need the insert the update will fail. If not the update succeeds. Thus we save directly verifying if 'word' is in the table or not.
cheers
tachyon
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Memory Overflow
by Anonymous Monk on Jan 06, 2004 at 07:36 UTC | |
by tachyon (Chancellor) on Jan 06, 2004 at 10:01 UTC |