in reply to Re: divide multi-column input file into sub-files depending on specific column's value
in thread divide multi-column input file into sub-files depending on specific column's value

Hi, thanks for your time. I know, it was a really long post :( I was just testing your answer, it does indeed do part of what I want. My problem is that it's a bit too smart for my perl knowledge :P As you can see I tried to approach it in a very lengthy way because I could understand it better. I'll try to understand exactly what's happening in the code you posted, it is indeed very useful.

Can I ask one more favour? Could you please maybe explain, when/if you find some time to read the rest of my post, what I'm doing wrong with populating the array? I've been googling for hours and don't seem to be able to get it. All I'm trying to write is that all the $match variables are meant to belong in the @match array, or does that make no sense? In my mind, I'm trying to find a way to print $match and have all the values from -10 to 10 printed. Does that sound correct or am I completely off?

Update: Ok this is what I did and it's working! :) I modified my code from my original post (code attempt #2) by adding and editing your contribution. I believe it works correctly for my purpose, I'm now confirming that my output files are correct.

#!/usr/bin/perl use warnings; use strict; my %fhs; my $molecule = "1kc4"; open my $FILE, '<', 'input_file' or die $!; while (<$FILE>) { chomp; my @columns = unpack('a8 a8 a8 a6'); #print join(" ",map {$_} @columns), "\n"; #print "@columns[3] \n"; foreach (@columns[$#columns]) { my $abs = abs( $columns[ $#columns ] ); open $fhs{ $abs }, '>', "${molecule}_cluster_" . $abs or die $! un +less exists $fhs{ $abs }; my $file = $fhs{ $abs }; my $ID = $_; my $IDform = sprintf ("%4s", $ID); my $currentline = $.; my $currentlineform = sprintf ("%7s", $currentline);## my @selection = (@columns[0..$#columns-1]); my $layout = "%10s"x(@selection) . "\n"; printf $file $IDform . $currentlineform . $layout, @selection; } }
The output filenames are correct and one of them looks like this: 8 109 -42.129 -57.475 94.651 8 110 -45.520 -62.056 90.318 8 111 -49.196 -63.045 92.577 8 112 -46.086 -71.753 88.267 -8 113 -48.146 -76.799 77.638 8 114 -41.865 -62.567 86.437

It would be great if you could take a look and let me know if you see something erroneous in my code. Thank you again for your time :)

  • Comment on Re^2: divide multi-column input file into sub-files depending on specific column's value
  • Select or Download Code

Replies are listed 'Best First'.
Re^3: divide multi-column input file into sub-files depending on specific column's value
by BrowserUk (Patriarch) on Jul 05, 2016 at 14:37 UTC

    Are you not getting loads of warnings when you run your code?

    For example: Scalar value @columns[$#columns] better written as $columns[$#columns] at ...


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I was only getting this one warning but I thought it would be ok...? Oops :( I fixed it now.

      I'm now facing another problem - I'm trying to print the formatted input into a new file so that I have the nicely formatted input saved somewhere.

      However when I do

      #!/usr/bin/perl use warnings; use strict; my %fhs; # hash with last_column values open my $INPUT, '<', 'input_file' or die $!; while (my $line = <$INPUT>) { #chomp; # this gives me syntax errors??? my @columns = $line =~ m/\s*(-?[.\d]+)/g; # split columns foreach ($columns[$#columns]) { # foreach element of the last column + = cluster ID open my $FORM, '>', 'output.FORM' or die $!; my $columnform = "%10s"x(@columns) . "\n"; # printf ($FORM $columnform, @columns); # hangs, creates empty file printf $columnform, @columns; # prints on screen correctly close $FORM; } }

      When I print on screen, it prints fine. When I try to print in a file, it doesn't. You might notice that I changed the way I split my columns as the fixed width solution was a bit easier to understand but also a bit dangerous as I can't be 100% that all files will have fixed width entries, so I changed it. I'm probably a pain but if you could hint what's happening with printf I'd be super grateful.

      Edit: I'm also trying this - where the $FORM printf bit is out of the loop, and interestingly it does print something, but it's only one random line.
      #!/usr/bin/perl use warnings; use strict; my %fhs; # hash with last_column values open my $INPUT, '<', 'input_file' or die $!; while (my $line = <$INPUT>) { #chomp; # this gives me syntax errors??? my @columns = $line =~ m/\s*(-?[.\d]+)/g; # split columns open my $FORM, '>', 'output.FORM' or die $!; my $columnform = "%10s"x(@columns) . "\n"; # printf ($FORM $columnform, @columns); # hangs, creates empty file printf $columnform, @columns; # prints on screen correctly close $FORM; foreach ($columns[$#columns]) { # foreach element of the last column + = cluster ID # blah } }
        while (my $line = <$INPUT>) {
        #chomp; # this gives me syntax errors???
        ...
        }

        The  chomp; statements in the code in this post operate on the default  $_ scalar, which does not seem to be initialized anywhere in the code. Let me suggest that what you are seeing is not a syntax error but a "Use of uninitialized value ..." warning (not an error) because you have very wisely enabled warnings in your code.

        c:\@Work\Perl\monks>perl -wMstrict -le "chomp; " Use of uninitialized value $_ in scalar chomp at -e line 1.
        A solution is to chomp something that has been assigned a value, like  $line in the while-loop conditional expression:
            chomp $line;


        Give a man a fish:  <%-{-{-{-<

        I can't see anything wrong with your printf statement, and it works for me:

        open $FORM, '>', 'column.FORM' or die $!;; @c = (1.2, 1.3, 1.4, 1.5); $t = '%10s'x@c . "\n"; printf( $FORM $t, @c );; close $FORM;; ^C C:\test>type column.FORM 1.2 1.3 1.4 1.5

        so what is going wrong I have no idea.

        Equally your comment, #chomp; # this gives me syntax errors??? makes no sense, chomp; cannot be a syntax error.

        Your comments in an earlier thread to the effect of "I don't know why it didn't work before but it does now", all suggest that the way you are writing your code; or running your scripts, or some other environmental factor is causing you to experience problems that are not down to Perl, or the code you are posting.

        The upshot is, I'm going to suggest that you try to seek out someone local to you with some programming knowledge to watch you write and run a small, simple program and perhaps he will see the problem that we cannot see when interacting with you this way.

        A final comment on your script above: You do know that:$columns[$#columns] is a single value?

        If so, why are you using a foreach loop to iterate over a single value?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.