Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^4: Perl DBI for MariaDB v SQLite in a Single Use SBC

by Marshall (Canon)
on Mar 21, 2023 at 05:29 UTC ( [id://11151090]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Perl DBI for MariaDB v SQLite in a Single Use SBC
in thread Perl DBI for MariaDB v SQLite in a Single Use SBC

I did some more investigation and contrary to previous reports, there are SQLite equivalents to these commands. And there are even different variations on the theme (there is more than one way to do this). However, AFIK the relevant commands are only available via SQLite.exe, the command line interface to SQLite. In your example the input file is read into a table STAGE, then each row in that table STAGE is split out via SUBSTRING into a new table A.

Because at least my Perl doesn't come with the command line installed, I didn't pursue this idea further and instead would like to show you how to use what you already have via the Perl DBI program I/F without needing the command line I/F. As a DB browser, I use a separate GUI that I like so I don't even have the SQLite command line I/F on my computer. SQLite essentially comes with your Perl distribution (can't imagine a distribution that did not include the DBI). "use DBI;" is all you need - there is nothing else that is necessary to install.

I really didn't understand what your text format meant. So to write some demo code, I just picked 3 substrings that I defined from a Q line. You will need to define others. I hope you can understand the concept and then extend it further. I took your example data and put it in a file called "testImportFile.txt". The lines were 1019 chars so I shortened them to the claimed 1017 characters. The first Q line was truncated to produce an error condition of malformed input record.

Here is the code. Then I will discuss further...

use strict; use warnings; use DBI; my $dbName = "YourDBname.sqlite"; my $import_file = "testImportFile.txt"; my %attr = ( RaiseError => 1); # auto die with error printout my $dbh = DBI->connect("dbi:SQLite:dbname=$dbName","","",\%attr) or die "Couldn't connect to database $dbName: " . DBI->errstr; open (my $fh_in, '<', $import_file) or die "cannot access file:\'$impo +rt_file\' $!\n"; #suppress Perl line number $dbh->do ("DROP TABLE IF EXISTS Data"); $dbh->do ("CREATE TABLE Data (fieldA TEXT, fieldB TEXT, fieldC TEXT)") +; my $insertRow = $dbh->prepare ("INSERT INTO Data (fieldA, fieldB, fiel +dC) VALUES (?, ?, ?) "); + $dbh->begin_work; ### Starts a transaction!! while (<$fh_in>) { next unless /^Q/; chomp; my $len = length($_); if ($len != 1017) { print "*** ERROR FOLLOWING LINE IS $len CHARS, not 1017! THIS +LINE IS REJECTED\n"; print "$_\n\n"; next; } # The number of fields must match the number of columns in CREATE T +ABLE # or the program will stop with a fatal error! my @fields = (substr($_,1,57), substr($_,59,32), substr($_,93,25)); $insertRow->execute(@fields); print "$_\n" for @fields; print "\n"; # Just to show data for demo } $dbh->commit; ### Completes a transaction!! __END__ *** ERROR FOLLOWING LINE IS 76 CHARS, not 1017! THIS LINE IS REJECTED Q0001012345678900012345678900012345678900000000GB00000000 01234567890 +001234 0001012345678900012345678900012345678900000000GB00000000 01234567890001234567890123456789 ABCDEFGHIJKLMNOPQRSTUVWXY 0001012345678900012345678900012345678900000000GB00000000 01234567890001234567890123456789 ABCDEFGHIJKLMNOPQRSTUVWXY
I called my data table "Data" instead of "A". Use whatever name you want. In the CREATE TABLE statement, I used the SQLite data type of TEXT instead of char() or varchar(). In SQLite, char(n) is always n chars in width. varchar(n) can contain more than n characters unlike other DB's where this is limited to 1..n chars. If you have those designations in your MySQL code, by all means, use them. Just be aware that you can stick 100 chars into a varchar(5)! Here TEXT is pretty generic and works just fine.

SQLite is a full ACID database. This is a good thing for data consistency, but the bookkeeping involved really slows down an insert. The thing to do is to bunch a whole bunch of inserts together into what is called a transaction. According to ACID principles, a transaction either fully succeeds or fully fails - there is no "in-between". Finalizing a transaction with 100 inserts takes about as long as a transaction with 100,000 inserts. You can do maybe 10 transactions per second - computers vary... I have made transactions with 1 million inserts...works fine. If you try one insert per transaction, performance will auger into the ground!

Note that for Perl substring, the index starts at "0", not "1". I think for SQL the first index is "1". So be aware of that and adjust for the dreaded "off by one" accordingly.

The demo code does not make any intermediate table and goes directly from input file to DB. This may even run faster than your import utilities for Maria. In general, you will find SQLite to be quite performant. The big limitation is that there can only be one writer at a time and acquiring an exclusive file lock for that is slow. There can be many simultaneous readers. This looks perfect for your application. Also note that SQLite now has many "big boy" features like stored procedures, etc.

I assumed that all of these fields were actually stored as characters. If some are binary integers, then there are a few "yeah, but's" which can be discussed if you have trouble.

Have fun! Hope this was helpful...

Update:
To get an idea of performance, I ran one of my applications and timed it. It uses 12,400 input data files, creates a main table with 3.6 million rows, and executes in 182 seconds (3 minutes). My machine was a used, refurbished i5 when it was bought 8-9 years ago - so not a screamer by any means. More than a million rows a minute is a fine data import speed for this application. It is possible to go faster, by defeating more of the ACID features and journaling, but that is not worth it to me.

Replies are listed 'Best First'.
Re^5: Perl DBI for MariaDB v SQLite in a Single Use SBC
by afoken (Chancellor) on Mar 21, 2023 at 13:02 UTC

    When it comes to handling CSV, the usual hint is to use Text::CSV (and Text::CSV_XS). For the data shown, readline and substr should be sufficient. When the data becomes slightly more complex, Text::CSV is the way to go, because it can handle the crazy edge cases you never even thought of.

    Having said that, there is also DBD::CSV, sitting on top of Text::CSV_XS and providing access to CSV files as if they were any other ordinary SQL database. And at that point, you no longer have to think about the CSV files at all. Just SELECT from the CSV files using one database handle connected to the CSV files "database", and INSERT/UPDATE to SQLite (or any other database) using a second database handle connected to SQLite (or any other database).

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11151090]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-03-28 15:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found