in reply to Accessing files at certain line number
I see two possibilities for optimizing the speed of your program by reducing the number of file accesses it makes:
In addition to these two points, you might want to consider if you actually need exactly 1024 lines per batch or if it is OK to use "roughly" 1024 lines per batch. Then you can simply read the first (say) 10_000 lines and use their average length to split up the file into batches of roughly 1024 lines. Whenever you end up in the middle of a line with the start of your batch, you move the start in the direction of the beginning of the file, and the same with the end position of your batch. This will save you the need of reading through the lines just for counting them, but that might or might not be an overall speed gain, since you will need to read the whole file line by line at least once anyway.
|
|---|