in reply to I'm trying to print the contents of a hash to newly created files

I'm wondering why you need the hash in the first place. Would it not be simpler to open, write and close the output file for each ID as you're reading the input file? Then you don't need to keep anything in memory. Something like:

use strict; use warnings; open my $in_fh, '<', "human...fa" or die $!; my $out_fh; while (defined(my $line = <$in_fh>)) { if ($line =~ /^>(.*)/) { my $id = $1; if ($out_fh) { close $out_fh or die $!; } open $out_fh, '>', $id or die $!; next; } if ($out_fh) { print $out_fh $line; } }

Note: code above is untested!

Also, you may be better off capturing only known good characters for the ID to avoid special characters (like shell redirection symbols) in filenames.

  • Comment on Re: I'm trying to print the contents of a hash to newly created files
  • Download Code

Replies are listed 'Best First'.
Re^2: I'm trying to print the contents of a hash to newly created files
by tobyink (Canon) on May 08, 2017 at 16:13 UTC

    You're assuming that each id only occurs once per file.

    open $out_fh, '>>', $id or die $!;

    … might be better.

      The replacement of $filename with $id was very useful and it is clear to me, but can you explain to me the significance of '>>' as opposed to '>'? My script seems to be working thanks to your input in conjunction with another Perl Monk. Thank you, sir.

        In simple terms the single > means to open a file from output starting it from the beginning, while the double >> means to open a file for output but to append what you are about to write at the end of anything that already exists. This write-append method will also open a non existent file and start writing from the beginning

        In your case i would think that the write-append mode is dangerous. The files could fill up with repeated identical sequences because the id existed in more than one input file or you just ran the program a second time, for in write-append mode each run will just append the new data at the end of the existing file.

        There are ways to tell if you have already encountered that id and wrote it to a file already. i might do it like this

        foreach $id (keys %id2seq){ if (-f $id) { print $id." already exists. about to overwrite i +t\n";} open my $out_fh, '>', $id or die $!; ##Amendment here print $out_fh ($id."\n",$id2seq{$id}, "\n"); close $out_fh; ## moved into the foreach loop }
        Note the change to write-from-beginning, and the test to see if the file already exists. This way there would only be the last sequence found in any give file.

        can you explain to me the significance of '>>' as opposed to '>'?

        See open (it's also explained in Opening Text Files for Writing):

        If MODE is >, the file is opened for output, with existing files first being truncated ("clobbered") and nonexisting files newly created. If MODE is >>, the file is opened for appending, again being created if necessary.