Re^2: Help performing "random" access on very very large file

I believe I could possibly do both; make my 1 big file into say, 1000 smaller files, and then find the byte offsets and create additional data structures for each. This way I could also spread the files across several disks, allowing the random accesses to be shared. When I write some code, i'll post it up! one problem is just getting an exact number for the line count of the big file, wc -l is taking forever! I just looked at the size of the file, and number of bytes for a small sample of the file to get an estimate of the line count.

Comment on Re^2: Help performing "random" access on very very large file

Replies are listed 'Best First'.
Re^3: Help performing "random" access on very very large file by dsheroh (Monsignor) on Jul 16, 2007 at 15:01 UTC
I'm not sure you really need to know an exact line count to do this... Just put the first 1024 lines into the first file, the next 1024 into the second file, etc. and end up with as many files as you end up with, using the disks round-robin to distribute them as evenly as possible. (The number of lines per file is arbitrary, of course, but I'm guessing 1024 would provide a manageable number of files and powers of 2 have the nice property of allowing you to just do `$line_number >> 10` to determine the file to use instead of requiring the CPU to do actual division.)	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: Help performing "random" access on very very large file
by dsheroh (Monsignor) on Jul 16, 2007 at 15:01 UTC

$line_number >> 10

[reply]
[d/l]