in reply to Re: Re: many to many join on text files
in thread many to many join on text files

It's not clear whether your slow insert was with DBD::SQLite or with a 'real' DB. 'Real' databases generally have a mechanism for importing data from from a flat file, CSV or tab delimited values, which runs much faster than any script.

If you're going to have a field which you do not populate, why not configure the DB to provide a default of 'undef', or specify the SQL statement so that field is not a variabl;e but is hard-coded to 'undef' ????

--
TTTATCGGTCGTTATATAGATGTTTGCA

  • Comment on Re: Re: Re: many to many join on text files

Replies are listed 'Best First'.
Re: Re: Re: Re: many to many join on text files
by davido (Cardinal) on Apr 15, 2004 at 04:49 UTC
    Sorry if I was unclear.

    First, yes, I used DBD::SQLite and got the results described. Individual inserts are obviously faster than rewriting a flat-file every time you want to add something. And queries are very fast. But the act of doing 3000 inserts is a lot slower than writing a plain old flat file. Nothing surprising there; obviously writing a flat file should be quicker on a one-shot deal. But the point that I took from the OP's question was that he was creating a joined file from two separate files, and wanted to do so quickly. While maintaining a database efficiently is probably faster than maintaining a flat file, he seemed to be concerned with the creation time, not the ongoing maintenance time.

    To answer your question about the hard-coded undef, check the documentation for SQLite. My first column in the table is an ID column, of type, "INTEGER PRIMARY KEY". I didn't mention this before because it was irrelevant to the discussion. The SQLite documentation states that this field can be populated with an autoincrementing unique index number, and to do so, all you have to do is insert a NULL value into that column, and SQLite will handle the rest. ...And a good way to insert NULL using DBI is to insert undef.

    To your point, it might have been better to use undef in that field of my $sth->prepare() method call instead of one additional placeholder, but the speed gain is truly minimal. Adding one more placeholder isn't my code's bottleneck (I know because I initially created a similar database a couple days ago without the ID field and witnessed similar creation time).


    Dave

Re: Re: Re: Re: many to many join on text files
by castaway (Parson) on Apr 15, 2004 at 05:26 UTC
    I'm not sure what you're implying, since I think SQLite is more 'real' DB than some as claim it.. Anyway, SQLite can COPY with the best of 'em.

    C.