extracting columns

perllearner007 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: extracting columns by Marshall (Canon) on Mar 22, 2012 at 18:38 UTC
When you do something other than the default split, you need to specify the variable to split upon. `my @fields = split /\t/; should be: my @fields = split /\t/,$_;` [download] Add tags around code so that it appears formatted correctly in the post. `<code>...your code here</code>` Oh, I see `<\code> should be </code>` before the print, just skip the print and loop for next line if $SIZE doesn't meet the criteria: `next unless ($SIZE >2.0 and $SIZE < 5.0);` [download]	[reply] [d/l] [select]
Re^2: extracting columns by perllearner007 (Acolyte) on Mar 22, 2012 at 20:24 UTC
Hi Marshall, Thanks for the reply. However, changing to `my @fields = split /\t/,$_;`hasn't resolved the issue. I still get the same output. Thanks for pointing out the <\code> error!	[reply] [d/l]
Re^3: extracting columns by Marshall (Canon) on Mar 22, 2012 at 22:41 UTC
Thanks for the code tag change! Now I can read it. Evidently, you do not need the 2nd arg in the split(), my memory was faulty, Ooops. When you add in the $SIZE comparison, you need to do something about the line that is not numeric (the first header line) - otherwise the comparison will fail with "$SIZE not numeric". You could read and deal with that first line before the loop or below I added a test in the "if" to check that it is numeric before doing the comparison. In the "if" if $SIZE doesn't start with 0-9 or the "minus sign" or period, then rest of the "if" is skipped and the line is printed. I usually add a "skip blank" line statement in these things as sometimes there is a trailing blank line that causes trouble- that's purely optional. #!usr/bin/perl -w use strict; while(<DATA>) { next if /^\s$/; # skip blank lines my ($SAMPLE_NAME, $SIZE) = (split /\s+/)[1,3]; # print line if $SIZE isn't numeric, e.g. "SIZE" # or if it is a number, then must meet min, max criteria if ($SIZE =~ /^[^0-9-.]/ or ( $SIZE > 2 and $SIZE <5)) { print "$SAMPLE_NAME $SIZE\n"; } } =prints SAMPLE_NAME SIZE U7345 4.333 =cut __DATA__ ID SAMPLE_NAME EFFECTS SIZE 001 U7654 NEGATIVE 5.666 002 U7345 POSITIVE 4.333 003 U7674 NEGATIVE 1.696 002 U7845 POSITIVE -4.333 [download]	[reply] [d/l]
Re^4: extracting columns by perllearner007 (Acolyte) on Apr 06, 2012 at 17:36 UTC
Re^4: extracting columns by perllearner007 (Acolyte) on Apr 06, 2012 at 18:27 UTC
Re: extracting columns by toolic (Bishop) on Mar 22, 2012 at 18:42 UTC
`use warnings; use strict; while (<DATA>) { my ($s, $n) = (split)[1,3]; print "$s $n\n"; } __DATA__ ID SAMPLE_NAME EFFECTS SIZE 001 U7654 NEGATIVE 5.666 002 U7345 POSITIVE 4.333 003 U7674 NEGATIVE 1.696 002 U7845 POSITIVE -4.333` [download] Prints out: `SAMPLE_NAME SIZE U7654 5.666 U7345 4.333 U7674 1.696 U7845 -4.333` [download] split Slices	[reply] [d/l] [select]
Re: extracting columns by JavaFan (Canon) on Mar 22, 2012 at 18:37 UTC
`perl -anE 'say "@F[1,3]" if $F[3] > 2 && $F[3] < 5' filename.txt` [download]	[reply] [d/l]
Re: extracting columns by jose_m (Acolyte) on Mar 22, 2012 at 18:54 UTC
i would read the file into an array its cleaner and easier to read `#!/usr/bin/perl my $file="file.txt"; open (FH, "< $file") or die "$!"; while (<FH>) { push (@lines, $_); } close FH or die "$!"; print "@lines[0]\t@lines[1]\n\n";` [download]	[reply] [d/l]
Re^2: extracting columns by GrandFather (Saint) on Mar 22, 2012 at 20:46 UTC
its cleaner and easier to read Why? It's not cleaner because it introduces an extra layer of array handling. Your rendition is not easier to read (for me at least) because the indentation is bad. You have also missed strictures which the OP++ had and you don't use lexical file handles which the OP++ also had. You should use the three parameter version of open and it is good to have the die mention the name of the file being opened or created as well as showing the system error message. `print "@lines[0]\t@lines[1]\n\n";` should be written `print "$lines[0]\t$lines[1]\n\n";`. Your code does not actually do what the OP wants to achieve. The OP wants to extract selected fields from a data file and generate a new file. Your code reads pairs of lines from a file then prints them out as pairs of lines with two blank lines between them and a tab prepended to the second line of each pair. Don't just invent stuff and expect it to work! True laziness is hard work	[reply] [d/l] [select]
Re: extracting columns by TJPride (Pilgrim) on Mar 22, 2012 at 20:56 UTC
`<$in>; ### Remove header line while (<$in>) { chomp; my ($name, $size) = (split /\t/)[1,3]; print $out "$name\t$size\n"; }` [download]	[reply] [d/l]
Re^2: extracting columns by GrandFather (Saint) on Mar 22, 2012 at 21:55 UTC
or without the need for the intermediate variables: `<$in>; while (<$in>) { chomp; print $out join ("\t", (split /\t/)[1,3]), "\n"; }` [download] True laziness is hard work	[reply] [d/l]
Re^2: extracting columns by polettix (Vicar) on Mar 22, 2012 at 23:10 UTC
When there are few fields, I usually prefer to avoid slicing and be more direct: `my (undef, $name, undef, $size) = split /\t/;` [download] or even the full stuff `my ($id, $name, $effects, $size) = split /\t/;` [download] I find it a bit more readable but... it's a matter of taste! perl -ple'$_=reverse' <<<ti.xittelop@oivalf Io ho capito... ma tu che hai detto?	[reply] [d/l] [select]
Re^3: extracting columns by Marshall (Canon) on Mar 25, 2012 at 19:02 UTC
I find it a bit more readable but... it's a matter of taste! Don't declare "my" variables that you do not use. Use a regex when you want to "keep" something. Use split when you want to "throw away something" Use list slice to "throw away" extraneous stuff from a split() or a match "global". `my ($id, $name, $effects, $size) = split /\t/; #wrong` [download] Forget even declaring, for example, $effects if it is not used. Focus the code on what is used from the input - forget the stuff that is not used. Explain what $effects would have meant - but its not important to this code - in some kind of comment section - if that this important to the overall description of the input file. The use of "undef" instead of list slice is just fine for a case like this. List slice is great when you want #12, #3, #1, #50-67 in that order.	[reply] [d/l]