There were a few issues in your code. I fixed them below.
my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)");
my $count = $dbh->prepare( "select count(*) from HLDdata where Data =
+?");
foreach $txt (@TXT) {
print "$txt\n";
open(IN,"$txt") || warn("cant open $txt");
while(<IN>){
$line = $_;
$count->execute($line);
my ($data) = $count->fetchrow();
print "$data \n";
if ($data != 0){
print "record exist not adding\n";
Logit("record exist not adding");
next;
}
my ($Data) = $line;
$sth->execute($Data);
$line =~s/\|/\t/g;
# print "$line\n";
print OUT $line;
}
}
Yes, you're probably correct that the checking is taking the most of the time. This is because you're missing an index on HLDdata (data) -- sqlite has to scan the whole table every time you check for a row's existence. You can create the missing index with CREATE UNIQUE INDEX HLDdata_uniq ON HLDdata (Data);
There's a trade-off: you'll now use double the disk space. If $Data tends to be big (above a hundred bytes or so), you are better off storing a hash (sha1, md5, or similar) of $Data in the table in a second column and checking for its existence instead. (SQLite does not appear to have support for functional indices or a hashing function, so you'll have to do it on the Perl side.)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|