in reply to Re: Efficient way to handle huge number of records?
in thread Efficient way to handle huge number of records?

Yes that might be actually the solution to my problem!
I have always the same kind of database file, which is:
>protein1 ASKRKGRKGTTRHTRHTRHTRHTHRHTR >protein2 REHGTRYUYTJUYJYTJUJUJUJYJUYJUY >protein3 EWFDEWFEFREFRE

Apparently I'll create a "big" hash with protein1, protein2, protein3 etc as keys and ASKRKGRKGTTRHTRHTRHTRHTHRHTR,  REHGTRYUYTJUYJYTJUJUJUJYJUYJUY,  EWFDEWFEFREFRE etc as values.
I am a bit confused as about the usage of this module though I browsed a bit in the respective manual page... Once you create the hash, how do you call it? Is it retrieved each time the script is called?
If you can please supply a brief example of usage!
Thanks a bunch!

Replies are listed 'Best First'.
Re^3: Efficient way to handle huge number of records?
by Marshall (Canon) on Dec 11, 2011 at 13:47 UTC
    Creating an SQLite DB is not that hard. I haven't tested this code, but this is at least the general idea.
    #!/usr/bin/perl -w use strict; use DBI; my $dbfile = "./DNA.sqlite"; print "Creating new DNA Database\n"; if (-e $dbfile) {unlink $dbfile or die "Delete of $dbfile failed! $!\n +";} my $dbh = DBI->connect("dbi:SQLite:name=$dbfile","","",{RaiseError => +1}) or die "Couldn't connect to database: " . DBI->errstr; $dbh->do ("CREATE TABLE dna ( id integer PRIMARY KEY AUTOINCREMENT, protein varchar(10) DEFAULT '', sequence varchar(1000) DEFAULT '' ); "); $dbh->do('PRAGMA synchronous = 0'); # Non transaction safe!!! $dbh->do('PRAGMA cache_size = 200000'); # 200 MB dynamic cache increa +se # makes index creation faster $dbh->do("BEGIN"); import_data(); $dbh->do("COMMIT"); $dbh->do("BEGIN"); $dbh->do ("CREATE INDEX iprotein_idx ON dna (protein)"); $dbh->do("COMMIT"); sub import_data { my $add = $dbh->prepare("INSERT INTO dna ( protein, sequence) VALUES(?,?)"); #...your loop to read the data goes here # foreach protein and sequence... { $add->execute($protein, $sequence); } # }
    Update:
    Basically when creating the DB, you want to turn all the ACID stuff off.
    That means: don't wait for writes to complete, don't do complete transactions for each "add", run the cache size up from default of 20MB to at least 200 MB for more efficient index creation. Run as "lean and mean" as you can.

      I don't know your PRAGMA code, but in general this is a possible solution.

      The only thing where I'd like to comment is that DBI allows and you should use a $dbh->begin_work and a $dbh->commit instead of $dbh->do( "BEGIN" ) and $dbh->do( "COMMIT" );

        I think that your formulation of "begin transaction" and "commit transaction" is more general and therefore better!

        The PRAGMA stuff is specific to SQLite. It is possible to dynamically increase the size of the memory cache that SQLite uses and actually to decrease it later. I did it to speed up the indexing operation which is actually quite complex and builds a big memory structure to do it.

Re^3: Efficient way to handle huge number of records?
by mrguy123 (Hermit) on Dec 11, 2011 at 14:37 UTC
    Hi, it is fairly simple. You store the hash in a file, and then retrieve it when you want to use it (see example below). It is retrieved as a hashref, but you can make it a regular hash with a '%' prefix
    use Storable; store \%table, 'file'; $hashref = retrieve('file');
    Good luck
    MrGuy