Re: Re: Code efficiency / algorithm

Replies are listed 'Best First'.
Re: Re: Re: Code efficiency / algorithm by DaveH (Monk) on Jan 14, 2003 at 08:53 UTC
I would recommend looking into SQLite (http://www.sqlite.org/), or more specifically, DBD::SQLite by Matt Sargent (available on CPAN). This is a self-contained relational database in a single db file (not to be confused with DB_File - they are in no way related). A link to it was posted here by merlyn a few days ago, and I'm now seriously considering using it in my next project. It is much more manageable and, with the correct indexes, probably faster than rolling your own databases with someting like DB_File. At the very least, the module may be useful for prototyping the concept of storing the info in a relational database, before moving onto a fully fledged RDBMS if you discover that it is necessary. Cheers, -- Dave :-) `$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print`	[reply] [d/l]
Re: Re: Re: Code efficiency / algorithm by graff (Chancellor) on Jan 14, 2003 at 07:51 UTC
Go for mySQL. Once you have your data files normalized into flat tables, loading them into mySQL will be really simple. But take your time in working out the table design, in order to make it fully normalized -- e.g. although you have just two data files, it looks like you may want at least three tables: one for companies, one for the "datafile1" records that relate to those companies, and the third for the "datafile2" records. The nature of the "other text content" in your data files may suggest other relations that could benefit from being indexed/related via one or more additional tables, apart from these three.	[reply]
Re: Re: Re: Code efficiency / algorithm by BrowserUk (Patriarch) on Jan 14, 2003 at 09:51 UTC
Personally, unless datafile2 is static, or you can arrange to have whatever is writing that file write directly to the database, I think that using RDBMS is likely to involve more trouble than its worth. The code involved to normalise and insert 80MB+ into the RDBMS is likely to take longer to write and longer to run than processing the data with perl. If the datafile is static that maybe mitigatable, but if you have to reprocess or keep munging large volumes of flat file in order to put it into the database, then the overhead doesn't seem worth it to me. Further, performing a join on tables where one of them is that size and every record has to be joined with one or more records from the smaller table is going to need quite substantial box and is not going to be quick. I would be of a different opinion if you can arrange to write the bigfile data directly to the RDBMS and you need to perform one or more additional processings of the big file data in some relational manner on a regular basis, then the process of conversion would start to make sense. As the popular disclaimer goes, just my 2 cents. Examine what is said, not who speaks. The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.	[reply]
Re: Re: Re: Re: Code efficiency / algorithm by runrig (Abbot) on Jan 14, 2003 at 17:24 UTC
Only the first file would need to go into the database, and it would just be a simple table with company, start date, end date, and the text (w/maybe an index on start and end date). The second file you would just read sequentially and use the date as an argument to query the datafile1 table.	[reply]