in reply to Re: split large CSV file >9.1MB into files of equal size that can be opened in excel
in thread split large CSV file >9.1MB into files of equal size that can be opened in excel

This would allow chunking my long line scenario more equally. But it's a matter of whether it's tolerable to the OP to have an input record (line) split across multiple files: if it's tolerable, follow ++davido's advice; otherwise, you will be stuck with potential size imbalances in the output.

And if you weren't expecting a size imbalance in the lines, you need to look at whatever's generating all_tags2.csv, and figure out why one line is significantly longer than the others.

  • Comment on Re^2: split large CSV file >9.1MB into files of equal size that can be opened in excel

Replies are listed 'Best First'.
Re^3: split large CSV file >9.1MB into files of equal size that can be opened in excel
by davido (Cardinal) on Sep 29, 2016 at 01:10 UTC

    Correct, I think. If I was unclear on the tradeoffs I did intend to imply that splitting on a delimiter, even if it's a delimiter near the middle of the file, will almost never result in both sides of the split being of identical size. If you're forced to split into identical sizes on a file where the physical middle is not guaranteed to fall on a record boundary, then you must necessarily lose record-oriented semantics, or deal with the record that spans the physical middle of the file being broken.

    I suspect that in this individual's case, it's a reasonable tradeoff for each file to be approximately equal in size to the extent that retaining logical record integrity permits. That is, records are more important than exact size.


    Dave