Hi all, I'm attempting to find a fast way to manipulate pretty large files (well, anything from like 100k to 2Gb).
As a quick run down - the files themselves containg a mini-markup language for driving laser printers. Each line in the file (\n delimited - MS Windows) is a separate instruction. The lines are then grouped into the commands to create a specific page and then the pages are grouped into sets of related pages. (These all get represented by objects that cache the data as it's discovered and make data extraction easier).
To cut a long story short, I need a method of being able to navigate around the file in as effeicient and speedy a manner possible (speed is probably more of a consideration than efficiency (memory usage et al) in this case).
Currently I'm using Tie::File but I'm not sure if this is the best way. I have the problem really that, if I want a line near the start of the file it gets returned pretty quickly, but if it's near the end it's taking a fair amount of time.
I was thinking about IO::File, but then to able to directly get a line I'd need to index the file first (else I don't know where to seek to (the lines are all variable in length)).
There are a few likely looking modules on CPAN but never having used them I'm not familiar with their strengths / weaknesses so I'd value some opinions.
Any code that can read the file also needs to be able to write to it so that the file may be amended - currently this gets done by hand in something like UltraEdit and is fairly clunky so I'm hoping what I'm developing will take some of the pain out of it :)
If I haven't covered something here adequately enough just let me know and I'll try to clarify :)
This is all based on MS Windows 2000/XP desktops and servers running ActivePerl 5.6.1 (build 633).
Thanks in advance,
Quick aside:
Just wondering if there's any reason why all my replies just got downvoted? :-?
Thanks all for the advice so far though. Sticking with Tie::File looks like getting into some kind of indexing. Is Tie::File the best solution here though (short of reading the thing into a db which I would if I could :)) or are there modules out there more suited to the task? I saw File::RandomAccess but it doesn't appear to be available via ActiveState PPM so it'd be a nightmare getting onto machines here.
All code is untested unless otherwise stated.
All opinions expressed are my own and are intended as guidance, not gospel; please treat what I say as such and as Abigail said Think for yourself.
If in doubt ask.
In reply to How to get fast random access to a large file? by gothic_mallard
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |