in reply to Re: many to many join on text files
in thread many to many join on text files
Don't get me wrong; I happen to really like the DBD::SQLite module; I use it myself. But let me give a recent example:
Today I wrote a brief throwaway script to parse a single, 3500 record file of two columns per record, and convert it to a three-column database table (first column is a unique key). The processing time on my machine was about four minutes for the insertion of 3500 records. This was ok for me, because I was looking for the advantages that a DB can bring down the road, and didn't care about initial creation time.
Taking 3500 divided by four minutes, times 1,000,000 records, divided by 60 minutes gives me the approximate estimate that it could take roughly 19 hours to INSERT 1,000,000 lines.
Here's a pseudo-code description of how I managed to take 4 minutes to INSERT 3500 records into a new table.
local $/ = "****\n"; my $sth = dbd->prepare("INSERT INTO table VALUES ( ?, ?, ? ); open my $infile, "<", 'inputfile.txt' or die "Bleah.\n$!"; while ( my $rec = <$infile> ) { chomp $rec; my ( $freq, $desc ) = split /\s+=\s+/, $rec; $sth->execute( undef, $freq, $desc ); } $sth->finish(); close $infile; $dbh->disconnect();
Again, that's just some pseudo-code from memory, but I was surprised to see how much longer it took to INSERT three columns into a new table as opposed to simply creating a new flat-file with three virtual columns per record. Manipulating the same input file and spitting out a flat-file took just a few seconds by comparison.
On the other hand, queries are lightning fast. And once the DB has been created, additional inserts are much faster than trying to "insert" something into the middle of a flat-file. But if initial creation time is the design factor, the DB solution isn't all that snappy.
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: many to many join on text files
by TomDLux (Vicar) on Apr 15, 2004 at 04:26 UTC | |
by davido (Cardinal) on Apr 15, 2004 at 04:49 UTC | |
by castaway (Parson) on Apr 15, 2004 at 05:26 UTC | |
|
Re: Re: Re: many to many join on text files
by pizza_milkshake (Monk) on Apr 15, 2004 at 17:46 UTC | |
by davido (Cardinal) on Apr 15, 2004 at 18:44 UTC |