Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, suppose I have a huge file "test_file.csv" (about 40GB). And I want to add a line at the beginning of the file without reading the whole file and rewriting it (since it takes a LOT of time). I tried "sed -i ..." and "perl -pi" but they are very slow. Finally I came around something like this:
$output_filename="text_file.csv"; open (FH, "+< $output_filename") or die "can't open"; my $line =<FH>; seek FH, 0,0; print FH "This is my header \n".$line;
But instead of prepending the line, it is replacing the existing lines!! i.e.
----------test_file.csv---------------
line 1
line 2
line 3
---------------------------------------

=> run the perl code

-------test_file.csv------------------
This is my header
ine 3
------------------------------------------


Any ideas?
Thanks!

Replies are listed 'Best First'.
Re: Prepending header line to HUGE csv file
by Laurent_R (Canon) on Oct 30, 2015 at 14:58 UTC
    What you are trying to do can't work, not because of Perl, but because of the very nature of sequential files.

    I'm afraid you have to bite the bullet and, one way or another (explicitly or implicitly), copy the whole file.

    Edit: fixed a typo (s/byte/bite/). And BTW, perl -pi ... and sed -i ... are just copying the file behind the scene, which is why you perceive them as very slow.

Re: Prepending header line to HUGE csv file
by choroba (Cardinal) on Oct 30, 2015 at 14:59 UTC
    Files grow at the end. You can't easily insert a line anywhere else. You might try playing with the buffer size to speed up the process, though:
    $/ = \65535; # You can try different numbers here. print {$NEW} $header; print {$NEW} $_ while <$OLD>;

    Update: it seems read is a bit faster on my system. YMMV.

    print {$NEW} $_ while read $OLD, $_, 2 ** 30 - 1;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Prepending header line to HUGE csv file
by hippo (Archbishop) on Oct 30, 2015 at 15:12 UTC
    Hello, suppose I have a huge file "test_file.csv" (about 40GB).
    ...
    But instead of prepending the line, it is replacing the existing lines!!

    I suspect most monks reading this would be entirely unsurprised that the data were overwritten rather than prepending. That's because experience has taught us this (not just in Perl, but in using pretty much any language to perform file manipulation). So forgive me if I'm wrong in assuming that you are relatively new to all this.

    If you are new to it then perhaps you could be so good as to justify why you have a CSV file that's 40GB in size? Experience says that this is very large for a CSV file. If you could reduce this size by an order of magnitude or more then the problem effectively vanishes. Of course it is entirely possible that every one of those 40 billion bytes is absolutely essential, just rather unlikely.

Re: Prepending header line to HUGE csv file
by BrowserUk (Patriarch) on Oct 30, 2015 at 18:16 UTC

    Why do you need to prepend the header line? Ie. How/with what are you going to process the file?

    The reason I ask is that many csv processing tools and libraries will allow you to pass the header line separately from the rest of the data; thus you avoid the problem entirely.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Prepending header line to HUGE csv file
by graff (Chancellor) on Oct 31, 2015 at 05:15 UTC
    Here's the fastest way I know of to prepend a line at the beginning of any file, regardless of its size. It involves the *n*x shell commands "echo" and "cat":
    echo "This is the new first line" | cat - big.file > new.big.file
    Of course, if the line to be prepended is really long or complicated, you can use a text editor or other method to create a small file containing just that first line (e.g. a file called "new.first.line"), and do this:
    cat new.first.line big.file > new.big.file
    I'm pretty sure there's no faster way it can be done. Of course, speed varies according to things like: Is the output being written to a local disk, or some sort of remote, network-mounted disk? (local storage is much faster) What sort of file system is it? ("Journaled" file systems might be slower), etc.
Re: Prepending header line to HUGE csv file
by shmem (Chancellor) on Oct 31, 2015 at 11:22 UTC
    But instead of prepending the line, it is replacing the existing lines!

    Read the lines that are replaced, seek to the beginning, write the header, pad with \0, seek to the end, append the saved lines from the beginning. Problem solved.

    If the order of the lines matters, you probably have an XY Problem. Why are you trying to do this, to what end? Is this file to be processed later? if so, how? or is it just an archive?

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'