in reply to Re^2: MCE: How to access variables globally
in thread MCE: How to access variables globally
I have no idea of what your raw data looks like or what end report you are trying to generate. Showing failed Perl code is not enough. You will have to back up and explain the initial problem (show some abbreviated example data) and some example of an end result. Then perhaps I or other Monks can suggest an appropriate table structure.
Don't underestimate SQLite. One project that I'm working on now has 3 directories with 3,000+ files in each. Creating the table for each directory takes about 8 seconds for 1M rows. Processing each directories worth of data takes <2 seconds. I am telling you that 3M row DB is nothing. How many million lines do you have? and how much data is each line? It very could be that instead of a single complete "de-normalized" table, you will wind up with multiple tables that are related somehow. For one project, I wound up using the URL that the data came from as a link between 2 tables. Didn't have to be that way, but that was sufficient for that project. DB's often use integer values as "keys" that link tables, but it doesn't have to be that way.
I don't know enough to advise further.
UPDATE: re: "UPDATE: I had some luck with DBM::Deep where I can store and retrieve from the database however the write step seems to be very slow although it keeps the memory footprint acceptable."
The "write step" to a DB is fast. The "commit step" is slow and DB journaling does take time. I have never used DBM::Deep. With SQLite as with other SQL db's, you want to: $db->begin_work;, then do millions of inserts, then do $dbh->commit; A single transaction can have millions of rows. I do recommend switching to SQLite.
SQLite is written in C. DBM::Deep is written in pure Perl. SQLite is the most used DB by far in the world - its in your cell phone - its in your browser - its everywhere. There are many performance tweaks that can be done to SQLite - often at the expense of compromising ACID properties. However, sometimes this is completely appropriate. In my example referenced above, I could speed things up by stopping journaling, starting async writes, etc. In my case 30 sec is "fast enough" for me and I don't worry about DB creation. 3 vs 30 seconds is the same for me. The "big lifting" is making a million inserts one transaction. But I could make my app faster if I really wanted to or needed to (which I don't).
Recommendation: Switch to SQLite. Faster, more options, gigantic user base, very stable. It already comes with the DBI module, so it is already on your machine.
Another Update: Re: Memory usage - DBM::Deep is a Perl program. SQLite is a C program and it can and does play by different memory management rules. I am not sure if this limit still exists or not, but at one time SQLite was limited to 2GB of RAM. Its default is way, way less than that. In one application that I targeted for WinXP, I ran ran SQLite's memory up to the extravagant level of 500MB for one "expensive" indexing operation and then back down again after that. A Perl program cannot give memory back to the O/S, but SQLite can. None of this matters for DB creation, but if you needed say to make a histogram of a column with 50M rows, more memory probably would help. I would have to look back at previous code to find the commands for adjusting memory, but they do exist. My XP application was targeted at a machine which only had max of 1-2GB of RAM. For your app, I would run memory footprint up to 2GB and not worry about it after that.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: MCE: How to access variables globally
by biohisham (Priest) on Dec 21, 2021 at 06:27 UTC | |
by Marshall (Canon) on Dec 21, 2021 at 11:07 UTC | |
by Marshall (Canon) on Dec 21, 2021 at 09:28 UTC |