aquinom has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks, I'm trying to transform some data like so: imagine I have many lines like 1:A,2:B,3:C,4:D,5:E I want to print everything that starts with 1 onto line 1 of my output file, everything that starts with 2 onto line 2, etc. Simply storing the info in arrays is not an option as it will use an astronomical amount of RAM to do this (30GB +). I suppose I could create 5 temp files and print the temp files to my outfile as one solution, but I'm sure there's a more elegant solution that someone's aware of. Any ideas?
  • Comment on write to/append a specific line in output file

Replies are listed 'Best First'.
Re: write to/append a specific line in output file
by blindluke (Hermit) on Aug 18, 2011 at 19:53 UTC

    This is just a thought, but I think that you could use the Tie::File module for this purpose.

    You could tie the output file to an array, and then use something like this:

    $array[2] .= 'B';

    This will append 'B' to the second line of the output file. The documentation states, that the file is not loaded into memory, so this will work even for gigantic files.

    regards,
    Luke Jefferson

      $array[2] .= 'B';

      This will append 'B' to the second line of the output file.

      $array[2] is actually the third line of the file.    Array indexes start at 0.

        You are right, but it could be different if the module ties the file to the array starting at its second index.

        The module's documentation gives mixed information. My suggestion was based on the following quote from module synopsis:

        $array[13] = 'blah'; # line 13 of the file is now 'blah'

        Same assumption is repeated through the pod file (line 17 under index 17 and so on). But in another place, the same documentation states:

        The first line of the file is element 0 of the array; the second line +is element 1, and so on.

        I think it's best that the OP tries the module, and does not assume anything, prior to some tests.


        UPDATE:

        jwkrahn is right. Out of curiosity, I did the tests - all works like a normal array should - $array[2] points to the third line of the file. Seems that the module documentation could use some corrections.

        regards,
        Luke Jefferson

      Thanks, I think that's what I was looking for.
Re: write to/append a specific line in output file
by pileofrogs (Priest) on Aug 18, 2011 at 20:40 UTC

    Your idea of putting the different sets into different files and then concatenating them together isn't as bad as you seem to think.

    If you're not storing it in ram, you've got to put it on disk somehow, and that seems like the most straight-forward simple solution.

    Having said that, what are you doing that requires this strange file format? Often when someone asks a question like this it turns out that there is a better way once we know more about the overall situation. Of course, often it also turns out that you just need a quick fix and you don't want to hash out the whole project, which I totally understand.

    --Pileofrogs

      I'm converting genome data to an input format for a software tool called PLINK that takes .ped files which mean an entire genome is on 1 line hence the strange format.
      in fact I might still go about doing it that way. Seems like this is re-writing the entire file every time it adds a new piece of data, so this is extremely slow unless you use "deferred writing" in which case it would store everything in memory and I would have gained nothing by using this.