perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:


Hello. I have a text file(tab-delim) with some as shown below.

ID SAMPLE_NAME EFFECTS SIZE

001 U7654 NEGATIVE 5.666

002 U7345 POSITIVE 4.333

003 U7674 NEGATIVE 1.696

002 U7845 POSITIVE -4.333

I am trying to get the sample name and size columns. Here is what I used.
#!usr/bin/perl -w use strict; # open the file that has to be read open(my $in, "filename.txt"); #open a new file which has to be made open (my $out, ">result.txt"); #In the while loop, put the columns that have to be printed in the new + file. while(<$in>){ my @fields = split /\t/; my $SAMPLE_NAME = $fields[1]; my $SIZE = $fields[3]; #print the columns { print $out "$SAMPLE_NAME\ $SIZE\n"; } } close the filehandles close($in); close($out);
-----

Output I get:
SAMPLE_NAME SIZE
001


clearly this is not right. First I get the column heading but not the values and then I get the first value of the ID column. Can someone see what is going on?
How do I extract only the values which are greater than 2.0 and less than 5.0 from the size column?

Replies are listed 'Best First'.
Re: extracting columns
by Marshall (Canon) on Mar 22, 2012 at 18:38 UTC
    When you do something other than the default split, you need to specify the variable to split upon.
    my @fields = split /\t/; should be: my @fields = split /\t/,$_;
    Add tags around code so that it appears formatted correctly in the post. <code>...your code here</code>
    Oh, I see <\code> should be </code>

    before the print, just skip the print and loop for next line if $SIZE doesn't meet the criteria:

    next unless ($SIZE >2.0 and $SIZE < 5.0);
      Hi Marshall, Thanks for the reply. However, changing to my @fields = split /\t/,$_;hasn't resolved the issue. I still get the same output. Thanks for pointing out the <\code> error!
        Thanks for the code tag change! Now I can read it.
        Evidently, you do not need the 2nd arg in the split(), my memory was faulty, Ooops.

        When you add in the $SIZE comparison, you need to do something about the line that is not numeric (the first header line) - otherwise the comparison will fail with "$SIZE not numeric". You could read and deal with that first line before the loop or below I added a test in the "if" to check that it is numeric before doing the comparison. In the "if" if $SIZE doesn't start with 0-9 or the "minus sign" or period, then rest of the "if" is skipped and the line is printed.

        I usually add a "skip blank" line statement in these things as sometimes there is a trailing blank line that causes trouble- that's purely optional.

        #!usr/bin/perl -w use strict; while(<DATA>) { next if /^\s$/; # skip blank lines my ($SAMPLE_NAME, $SIZE) = (split /\s+/)[1,3]; # print line if $SIZE isn't numeric, e.g. "SIZE" # or if it is a number, then must meet min, max criteria if ($SIZE =~ /^[^0-9-.]/ or ( $SIZE > 2 and $SIZE <5)) { print "$SAMPLE_NAME $SIZE\n"; } } =prints SAMPLE_NAME SIZE U7345 4.333 =cut __DATA__ ID SAMPLE_NAME EFFECTS SIZE 001 U7654 NEGATIVE 5.666 002 U7345 POSITIVE 4.333 003 U7674 NEGATIVE 1.696 002 U7845 POSITIVE -4.333
Re: extracting columns
by toolic (Bishop) on Mar 22, 2012 at 18:42 UTC
    use warnings; use strict; while (<DATA>) { my ($s, $n) = (split)[1,3]; print "$s $n\n"; } __DATA__ ID SAMPLE_NAME EFFECTS SIZE 001 U7654 NEGATIVE 5.666 002 U7345 POSITIVE 4.333 003 U7674 NEGATIVE 1.696 002 U7845 POSITIVE -4.333

    Prints out:

    SAMPLE_NAME SIZE U7654 5.666 U7345 4.333 U7674 1.696 U7845 -4.333
Re: extracting columns
by JavaFan (Canon) on Mar 22, 2012 at 18:37 UTC
    perl -anE 'say "@F[1,3]" if $F[3] > 2 && $F[3] < 5' filename.txt
Re: extracting columns
by jose_m (Acolyte) on Mar 22, 2012 at 18:54 UTC

    i would read the file into an array its cleaner and easier to read

    #!/usr/bin/perl my $file="file.txt"; open (FH, "< $file") or die "$!"; while (<FH>) { push (@lines, $_); } close FH or die "$!"; print "@lines[0]\t@lines[1]\n\n";
      its cleaner and easier to read

      Why? It's not cleaner because it introduces an extra layer of array handling. Your rendition is not easier to read (for me at least) because the indentation is bad.

      You have also missed strictures which the OP++ had and you don't use lexical file handles which the OP++ also had. You should use the three parameter version of open and it is good to have the die mention the name of the file being opened or created as well as showing the system error message.

      print "@lines[0]\t@lines[1]\n\n"; should be written print "$lines[0]\t$lines[1]\n\n";.

      Your code does not actually do what the OP wants to achieve. The OP wants to extract selected fields from a data file and generate a new file. Your code reads pairs of lines from a file then prints them out as pairs of lines with two blank lines between them and a tab prepended to the second line of each pair.

      Don't just invent stuff and expect it to work!

      True laziness is hard work
Re: extracting columns
by TJPride (Pilgrim) on Mar 22, 2012 at 20:56 UTC
    <$in>; ### Remove header line while (<$in>) { chomp; my ($name, $size) = (split /\t/)[1,3]; print $out "$name\t$size\n"; }

      or without the need for the intermediate variables:

      <$in>; while (<$in>) { chomp; print $out join ("\t", (split /\t/)[1,3]), "\n"; }
      True laziness is hard work
      When there are few fields, I usually prefer to avoid slicing and be more direct:
      my (undef, $name, undef, $size) = split /\t/;
      or even the full stuff
      my ($id, $name, $effects, $size) = split /\t/;
      I find it a bit more readable but... it's a matter of taste!

      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Io ho capito... ma tu che hai detto?
        I find it a bit more readable but... it's a matter of taste!

        Don't declare "my" variables that you do not use.

        Use a regex when you want to "keep" something.
        Use split when you want to "throw away something"

        Use list slice to "throw away" extraneous stuff from a split() or a match "global".

        my ($id, $name, $effects, $size) = split /\t/; #wrong
        Forget even declaring, for example, $effects if it is not used.
        Focus the code on what is used from the input - forget the stuff that is not used.
        Explain what $effects would have meant - but its not important to this code - in some kind of comment section - if that this important to the overall description of the input file.

        The use of "undef" instead of list slice is just fine for a case like this.
        List slice is great when you want #12, #3, #1, #50-67 in that order.