Re: How to get fast random access to a large file?

Change the file format, maintain an index. Add a header, something like

Line Offsets: 20, 55, 66, 99 ... bytes
Page Offset/Size: 1-4,5-9 ... lines
Sets: 1-2-3,4-5-6 ... pages
[download]

If all you're interested in is navigation, this might be enough.

If the file doesn't need to stay hand editable, and you're interested in manipulation, I'd switch to a database like BerkeleyDB or DBD::SQLite, depending on whether or not SQL is overkill.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Comment on Re: How to get fast random access to a large file? Download Code

Replies are listed 'Best First'.
Re^2: How to get fast random access to a large file? by gothic_mallard (Pilgrim) on Oct 29, 2004 at 12:53 UTC
As I replied to fglock I can't change the files structure, only it's data. i.e. I can add/remove/change lines but I have no control over "how" they're represented. Each line is a fixed length record in itself, but each record type has a different structure. Say, a "print here" may be: `x pos (6 char) y pos (6 char) string (300 char)` [download] and a "new sheet" may be `sheet number (4 char) stock code (10 char)` [download] and so on (those aren't real structures above but similar to the real thing). I was toying with the idea of creating index files, but that comes with the overhead of having to parse the original file first to create them (if I could do that quickly it wouldn't be so much of a problem ;-)). --- Jay All code is untested unless otherwise stated. All opinions expressed are my own and are intended as guidance, not gospel; please treat what I say as such and as Abigail said Think for yourself. If in doubt ask.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: How to get fast random access to a large file?
by gothic_mallard (Pilgrim) on Oct 29, 2004 at 12:53 UTC

As I replied to fglock I can't change the files structure, only it's data. i.e. I can add/remove/change lines but I have no control over "how" they're represented.

Each line is a fixed length record in itself, but each record type has a different structure. Say, a "print here" may be:

x pos (6 char) y pos (6 char) string (300 char)
[download]

and a "new sheet" may be

sheet number (4 char) stock code (10 char)
[download]

and so on (those aren't real structures above but similar to the real thing).

I was toying with the idea of creating index files, but that comes with the overhead of having to parse the original file first to create them (if I could do that quickly it wouldn't be so much of a problem ;-)).

--- Jay

All code is untested unless otherwise stated.
All opinions expressed are my own and are intended as guidance, not gospel; please treat what I say as such and as Abigail said Think for yourself.
If in doubt ask.

[reply]
[d/l]
[select]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.