I would recommend looking into SQLite (http://www.sqlite.org/), or more specifically, DBD::SQLite by Matt Sargent (available on CPAN). This is a self-contained relational database in a single db file (not to be confused with DB_File - they are in no way related).
A link to it was posted here by merlyn a few days ago, and I'm now seriously considering using it in my next project. It is much more manageable and, with the correct indexes, probably faster than rolling your own databases with someting like DB_File.
At the very least, the module may be useful for prototyping the concept of storing the info in a relational database, before moving onto a fully fledged RDBMS if you discover that it is necessary.
Cheers, -- Dave :-)
$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
| [reply] [d/l] |
Go for mySQL. Once you have your data files normalized into
flat tables, loading them into mySQL will be really simple.
But take your time in working out the table design, in order
to make it fully normalized -- e.g. although you have just
two data files, it looks like you may want at least three
tables: one for companies, one for the "datafile1" records
that relate to those companies, and the third for the
"datafile2" records. The nature of the "other text content"
in your data files may suggest other relations that could
benefit from being indexed/related via one or more additional
tables, apart from these three. | [reply] |
Personally, unless datafile2 is static, or you can arrange to have whatever is writing that file write directly to the database, I think that using RDBMS is likely to involve more trouble than its worth. The code involved to normalise and insert 80MB+ into the RDBMS is likely to take longer to write and longer to run than processing the data with perl. If the datafile is static that maybe mitigatable, but if you have to reprocess or keep munging large volumes of flat file in order to put it into the database, then the overhead doesn't seem worth it to me.
Further, performing a join on tables where one of them is that size and every record has to be joined with one or more records from the smaller table is going to need quite substantial box and is not going to be quick.
I would be of a different opinion if you can arrange to write the bigfile data directly to the RDBMS and you need to perform one or more additional processings of the big file data in some relational manner on a regular basis, then the process of conversion would start to make sense.
As the popular disclaimer goes, just my 2 cents.
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
| [reply] |
Only the first file would need to go into the database, and it would just be a simple table with company, start date, end date, and the text (w/maybe an index on start and end date). The second file you would just read sequentially and use the date as an argument to query the datafile1 table.
| [reply] |