in reply to divide multi-column input file into sub-files depending on specific column's value

This does what you asked for in the first screenful of your post. I wrote and tested it before noticing there was a lot more to the post, so here it is, maybe it is useful to you:

#! perl -sw use strict; my %fhs; while( <DATA> ) { my @bits = split; my $abs = abs( $bits[ 3 ] ); open $fhs{ $abs }, '>', 'output_' . $abs or die $! unless exists $ +fhs{ $abs }; print { $fhs{ $abs } } $_; } __DATA__ -59.077 89.301 115.664 7 -61.251 77.435 117.760 -6 -60.950 71.712 116.061 -7 -56.247 83.685 114.576 1 -59.263 76.107 112.555 -2 -59.895 65.296 111.185 3 -60.141 63.694 111.257 -3 -61.667 63.707 116.937 2 -58.722 60.429 111.307 -1 -57.511 42.922 112.108 6

Produces:

C:\test>type output_* output_1 -56.247 83.685 114.576 1 -58.722 60.429 111.307 -1 output_2 -59.263 76.107 112.555 -2 -61.667 63.707 116.937 2 output_3 -59.895 65.296 111.185 3 -60.141 63.694 111.257 -3 output_6 -61.251 77.435 117.760 -6 -57.511 42.922 112.108 6 output_7 -59.077 89.301 115.664 7 -60.950 71.712 116.061 -7

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: divide multi-column input file into sub-files depending on specific column's value
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: divide multi-column input file into sub-files depending on specific column's value
by angela2 (Sexton) on Jul 05, 2016 at 11:08 UTC

    Hi, thanks for your time. I know, it was a really long post :( I was just testing your answer, it does indeed do part of what I want. My problem is that it's a bit too smart for my perl knowledge :P As you can see I tried to approach it in a very lengthy way because I could understand it better. I'll try to understand exactly what's happening in the code you posted, it is indeed very useful.

    Can I ask one more favour? Could you please maybe explain, when/if you find some time to read the rest of my post, what I'm doing wrong with populating the array? I've been googling for hours and don't seem to be able to get it. All I'm trying to write is that all the $match variables are meant to belong in the @match array, or does that make no sense? In my mind, I'm trying to find a way to print $match and have all the values from -10 to 10 printed. Does that sound correct or am I completely off?

    Update: Ok this is what I did and it's working! :) I modified my code from my original post (code attempt #2) by adding and editing your contribution. I believe it works correctly for my purpose, I'm now confirming that my output files are correct.

    #!/usr/bin/perl use warnings; use strict; my %fhs; my $molecule = "1kc4"; open my $FILE, '<', 'input_file' or die $!; while (<$FILE>) { chomp; my @columns = unpack('a8 a8 a8 a6'); #print join(" ",map {$_} @columns), "\n"; #print "@columns[3] \n"; foreach (@columns[$#columns]) { my $abs = abs( $columns[ $#columns ] ); open $fhs{ $abs }, '>', "${molecule}_cluster_" . $abs or die $! un +less exists $fhs{ $abs }; my $file = $fhs{ $abs }; my $ID = $_; my $IDform = sprintf ("%4s", $ID); my $currentline = $.; my $currentlineform = sprintf ("%7s", $currentline);## my @selection = (@columns[0..$#columns-1]); my $layout = "%10s"x(@selection) . "\n"; printf $file $IDform . $currentlineform . $layout, @selection; } }
    The output filenames are correct and one of them looks like this: 8 109 -42.129 -57.475 94.651 8 110 -45.520 -62.056 90.318 8 111 -49.196 -63.045 92.577 8 112 -46.086 -71.753 88.267 -8 113 -48.146 -76.799 77.638 8 114 -41.865 -62.567 86.437

    It would be great if you could take a look and let me know if you see something erroneous in my code. Thank you again for your time :)

      Are you not getting loads of warnings when you run your code?

      For example: Scalar value @columns[$#columns] better written as $columns[$#columns] at ...


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I was only getting this one warning but I thought it would be ok...? Oops :( I fixed it now.

        I'm now facing another problem - I'm trying to print the formatted input into a new file so that I have the nicely formatted input saved somewhere.

        However when I do

        #!/usr/bin/perl use warnings; use strict; my %fhs; # hash with last_column values open my $INPUT, '<', 'input_file' or die $!; while (my $line = <$INPUT>) { #chomp; # this gives me syntax errors??? my @columns = $line =~ m/\s*(-?[.\d]+)/g; # split columns foreach ($columns[$#columns]) { # foreach element of the last column + = cluster ID open my $FORM, '>', 'output.FORM' or die $!; my $columnform = "%10s"x(@columns) . "\n"; # printf ($FORM $columnform, @columns); # hangs, creates empty file printf $columnform, @columns; # prints on screen correctly close $FORM; } }

        When I print on screen, it prints fine. When I try to print in a file, it doesn't. You might notice that I changed the way I split my columns as the fixed width solution was a bit easier to understand but also a bit dangerous as I can't be 100% that all files will have fixed width entries, so I changed it. I'm probably a pain but if you could hint what's happening with printf I'd be super grateful.

        Edit: I'm also trying this - where the $FORM printf bit is out of the loop, and interestingly it does print something, but it's only one random line.
        #!/usr/bin/perl use warnings; use strict; my %fhs; # hash with last_column values open my $INPUT, '<', 'input_file' or die $!; while (my $line = <$INPUT>) { #chomp; # this gives me syntax errors??? my @columns = $line =~ m/\s*(-?[.\d]+)/g; # split columns open my $FORM, '>', 'output.FORM' or die $!; my $columnform = "%10s"x(@columns) . "\n"; # printf ($FORM $columnform, @columns); # hangs, creates empty file printf $columnform, @columns; # prints on screen correctly close $FORM; foreach ($columns[$#columns]) { # foreach element of the last column + = cluster ID # blah } }