Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Index a file with pack for fast access

by sundialsvc4 (Abbot)
on Dec 21, 2011 at 12:48 UTC ( [id://944612]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Index a file with pack for fast access
in thread Index a file with pack for fast access

Interesting.   (Upvoted.)   Of course it is understood that you are answering, not advocating, but I found the answer interesting and informative.

As an aside, one characteristic of SQLite that “bit me bad” at first is the way that this system handles transactions.   Basically, you must have one, because if you don’t, SQLite will physically verify every single disk write by reading the information again.   Which certainly can result in the “hours or days” concern, and then rather dramatically relieve that concern.   I admit that I tend towards the use of SQL-based systems mainly so that I can subsequently run queries against them.   Perhaps I do not use hand-built searching techniques enough.   Thanks for your example.

Replies are listed 'Best First'.
Re^4: Index a file with pack for fast access
by Anonymous Monk on Dec 21, 2011 at 13:39 UTC
Re^4: Index a file with pack for fast access
by BrowserUk (Patriarch) on Dec 21, 2011 at 16:13 UTC
    As an aside, one characteristic of SQLite that “bit me bad” at first is the way that this system handles transactions. Basically, you must have one, because if you don’t, SQLite will physically verify every single disk write by reading the information again. Which certainly can result in the “hours or days” concern, and then rather dramatically relieve that concern.

    Transactions aren't involved when using SQLite's bulk loader. The syntax is simply:

    CREATE TABLE onegb ( alpha varchar, no varchar, hex varchar, bin varch +ar ); .separator "," .import file table

    But if you do that alone on a csv file containing 16 million records, you'll wait days. Try it for yourself.

    And doing it via SQL inserts, even with transactions, will take even longer. Again, try it for yourself.

    However, if you precede the .import with the appropriate bunch of seven PRAGMA commands, then the entire import takes just over 2 minutes. But finding/working out/remembering those 7 pragmas is non-trivial.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      http://erictheturtle.blogspot.com/2009/05/fastest-bulk-import-into-sqlite.html

      commands.txt

      .echo ON .read create_table_without_pk.sql PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF; PRAGMA locking_mode = EXCLUSIVE; PRAGMA count_changes = OFF; PRAGMA temp_store = MEMORY; PRAGMA auto_vacuum = NONE; .separator "\t" .import a_tab_seprated_table.txt mytable BEGIN; .read add_indexes.sql COMMIT; .exit

      sqlite3 mydb.db < commands.txt

        Cool! You found it. But try working it out for yourself from the SQLite docs.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://944612]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-24 16:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found