in reply to Re: Re: Need to process a tab delimted file *FAST*
in thread Need to process a tab delimted file *FAST*
Second, you're splitting lines on various things. split uses a regexp-like thingy, and that means revving up the regexp engine, which while well optimized isn't as fast as non-regexp alternatives. The problem is that your current file format doesn't lend itself well to non-regexp (and non-split) alternatives.
If you have control over the datasource, there are a few things you can do for better speed.
One suggestion is, instead of using "\n" delimited records, and "\t" delimited key/value pairs, go with a more "regular" format. One possibility would be fixed-width fields. With that sort of solution, at least you can unpack each record. That's going to be faster than splitting on a RE. If each "line" (or record) is of equal byte-length, and each key/value within each record is of fixed width, you can use seek, tell to jump around in the file, and unpack to grab keys/values from each record. It's pretty hard to beat that for speed, within Perl.
Another possibility is to abandon the flat file, and go with a database. You mentioned that you wanted to maintain a single-file for your data though. Ok, no problem. Use DBD::SQLite. It is a pretty fast database implementation that stores all of the data in one single file. There is database overhead to consider, but scalability is good, and you don't need to be as careful about maintaining equal-byte-length records with fixed-width fields.
And yet another possibility is to use the Storable module to freeze and thaw your datastructures. The module is written in XS (if I'm not mistaken) and optimized for speed already. It's not as scalable of a solution, but speed is pretty good.
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Need to process a tab delimted file *FAST*
by tilly (Archbishop) on Mar 03, 2004 at 15:17 UTC | |
by davido (Cardinal) on Mar 03, 2004 at 18:19 UTC |