Prepending header line to HUGE csv file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Prepending header line to HUGE csv file by Laurent_R (Canon) on Oct 30, 2015 at 14:58 UTC
What you are trying to do can't work, not because of Perl, but because of the very nature of sequential files. I'm afraid you have to bite the bullet and, one way or another (explicitly or implicitly), copy the whole file. Edit: fixed a typo (`s/byte/bite/`). And BTW, `perl -pi ...` and `sed -i ...` are just copying the file behind the scene, which is why you perceive them as very slow.	[reply] [d/l] [select]
Re: Prepending header line to HUGE csv file by choroba (Cardinal) on Oct 30, 2015 at 14:59 UTC
Files grow at the end. You can't easily insert a line anywhere else. You might try playing with the buffer size to speed up the process, though: `$/ = \65535; # You can try different numbers here. print {$NEW} $header; print {$NEW} $_ while <$OLD>;` [download] Update: it seems read is a bit faster on my system. YMMV. `print {$NEW} $_ while read $OLD, $_, 2 ** 30 - 1;` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re: Prepending header line to HUGE csv file by hippo (Archbishop) on Oct 30, 2015 at 15:12 UTC
Hello, suppose I have a huge file "test_file.csv" (about 40GB). ... But instead of prepending the line, it is replacing the existing lines!! I suspect most monks reading this would be entirely unsurprised that the data were overwritten rather than prepending. That's because experience has taught us this (not just in Perl, but in using pretty much any language to perform file manipulation). So forgive me if I'm wrong in assuming that you are relatively new to all this. If you are new to it then perhaps you could be so good as to justify why you have a CSV file that's 40GB in size? Experience says that this is very large for a CSV file. If you could reduce this size by an order of magnitude or more then the problem effectively vanishes. Of course it is entirely possible that every one of those 40 billion bytes is absolutely essential, just rather unlikely.	[reply]
Re: Prepending header line to HUGE csv file by BrowserUk (Patriarch) on Oct 30, 2015 at 18:16 UTC
Why do you need to prepend the header line? Ie. How/with what are you going to process the file? The reason I ask is that many csv processing tools and libraries will allow you to pass the header line separately from the rest of the data; thus you avoid the problem entirely. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Prepending header line to HUGE csv file by graff (Chancellor) on Oct 31, 2015 at 05:15 UTC
Here's the fastest way I know of to prepend a line at the beginning of any file, regardless of its size. It involves the nx shell commands "echo" and "cat": `echo "This is the new first line" \| cat - big.file > new.big.file` [download] Of course, if the line to be prepended is really long or complicated, you can use a text editor or other method to create a small file containing just that first line (e.g. a file called "new.first.line"), and do this: `cat new.first.line big.file > new.big.file` [download] I'm pretty sure there's no faster way it can be done. Of course, speed varies according to things like: Is the output being written to a local disk, or some sort of remote, network-mounted disk? (local storage is much faster) What sort of file system is it? ("Journaled" file systems might be slower), etc.	[reply] [d/l] [select]
Re: Prepending header line to HUGE csv file by shmem (Chancellor) on Oct 31, 2015 at 11:22 UTC
But instead of prepending the line, it is replacing the existing lines! Read the lines that are replaced, seek to the beginning, write the header, pad with `\0`, seek to the end, append the saved lines from the beginning. Problem solved. If the order of the lines matters, you probably have an XY Problem. Why are you trying to do this, to what end? Is this file to be processed later? if so, how? or is it just an archive? perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l]