Re: extracting columns
by Marshall (Canon) on Mar 22, 2012 at 18:38 UTC
|
When you do something other than the default split, you need to specify the variable to split upon.
my @fields = split /\t/;
should be:
my @fields = split /\t/,$_;
Add tags around code so that it appears formatted correctly in the post. <code>...your code here</code>
Oh, I see <\code> should be </code>
before the print, just skip the print and loop for next line if $SIZE doesn't meet the criteria:
next unless ($SIZE >2.0 and $SIZE < 5.0);
| [reply] [d/l] [select] |
|
|
Hi Marshall,
Thanks for the reply. However, changing to
my @fields = split /\t/,$_;hasn't resolved the issue. I still get the same output. Thanks for pointing out the <\code> error!
| [reply] [d/l] |
|
|
Thanks for the code tag change! Now I can read it. Evidently, you do not need the 2nd arg in the split(), my memory was faulty, Ooops.
When you add in the $SIZE comparison, you need to do something about the line that is not numeric (the first header line) - otherwise the comparison will fail with "$SIZE not numeric". You could read and deal with that first line before the loop or below I added a test in the "if" to check that it is numeric before doing the comparison. In the "if" if $SIZE doesn't start with 0-9 or the "minus sign" or period, then rest of the "if" is skipped and the line is printed.
I usually add a "skip blank" line statement in these things as sometimes there is a trailing blank line that causes trouble- that's purely optional.
#!usr/bin/perl -w
use strict;
while(<DATA>)
{
next if /^\s$/; # skip blank lines
my ($SAMPLE_NAME, $SIZE) = (split /\s+/)[1,3];
# print line if $SIZE isn't numeric, e.g. "SIZE"
# or if it is a number, then must meet min, max criteria
if ($SIZE =~ /^[^0-9-.]/ or ( $SIZE > 2 and $SIZE <5))
{
print "$SAMPLE_NAME $SIZE\n";
}
}
=prints
SAMPLE_NAME SIZE
U7345 4.333
=cut
__DATA__
ID SAMPLE_NAME EFFECTS SIZE
001 U7654 NEGATIVE 5.666
002 U7345 POSITIVE 4.333
003 U7674 NEGATIVE 1.696
002 U7845 POSITIVE -4.333
| [reply] [d/l] |
|
|
|
|
Re: extracting columns
by toolic (Bishop) on Mar 22, 2012 at 18:42 UTC
|
use warnings;
use strict;
while (<DATA>) {
my ($s, $n) = (split)[1,3];
print "$s $n\n";
}
__DATA__
ID SAMPLE_NAME EFFECTS SIZE
001 U7654 NEGATIVE 5.666
002 U7345 POSITIVE 4.333
003 U7674 NEGATIVE 1.696
002 U7845 POSITIVE -4.333
Prints out:
SAMPLE_NAME SIZE
U7654 5.666
U7345 4.333
U7674 1.696
U7845 -4.333
| [reply] [d/l] [select] |
Re: extracting columns
by JavaFan (Canon) on Mar 22, 2012 at 18:37 UTC
|
perl -anE 'say "@F[1,3]" if $F[3] > 2 && $F[3] < 5' filename.txt
| [reply] [d/l] |
Re: extracting columns
by jose_m (Acolyte) on Mar 22, 2012 at 18:54 UTC
|
#!/usr/bin/perl
my $file="file.txt";
open (FH, "< $file") or die "$!";
while (<FH>) {
push (@lines, $_);
}
close FH or die "$!";
print "@lines[0]\t@lines[1]\n\n";
| [reply] [d/l] |
|
|
its cleaner and easier to read
Why? It's not cleaner because it introduces an extra layer of array handling. Your rendition is not easier to read (for me at least) because the indentation is bad.
You have also missed strictures which the OP++ had and you don't use lexical file handles which the OP++ also had. You should use the three parameter version of open and it is good to have the die mention the name of the file being opened or created as well as showing the system error message.
print "@lines[0]\t@lines[1]\n\n"; should be written print "$lines[0]\t$lines[1]\n\n";.
Your code does not actually do what the OP wants to achieve. The OP wants to extract selected fields from a data file and generate a new file. Your code reads pairs of lines from a file then prints them out as pairs of lines with two blank lines between them and a tab prepended to the second line of each pair.
Don't just invent stuff and expect it to work!
True laziness is hard work
| [reply] [d/l] [select] |
Re: extracting columns
by TJPride (Pilgrim) on Mar 22, 2012 at 20:56 UTC
|
<$in>; ### Remove header line
while (<$in>) {
chomp; my ($name, $size) = (split /\t/)[1,3];
print $out "$name\t$size\n";
}
| [reply] [d/l] |
|
|
<$in>;
while (<$in>) {
chomp;
print $out join ("\t", (split /\t/)[1,3]), "\n";
}
True laziness is hard work
| [reply] [d/l] |
|
|
When there are few fields, I usually prefer to avoid slicing and be more direct:
my (undef, $name, undef, $size) = split /\t/;
or even the full stuff my ($id, $name, $effects, $size) = split /\t/;
I find it a bit more readable but... it's a matter of taste!
perl -ple'$_=reverse' <<<ti.xittelop@oivalf
Io ho capito... ma tu che hai detto?
| [reply] [d/l] [select] |
|
|
I find it a bit more readable but... it's a matter of taste!
Don't declare "my" variables that you do not use.
Use a regex when you want to "keep" something.
Use split when you want to "throw away something"
Use list slice to "throw away" extraneous stuff from a split() or a match "global".
my ($id, $name, $effects, $size) = split /\t/; #wrong
Forget even declaring, for example, $effects if it is not used.
Focus the code on what is used from the input - forget the stuff that is not used.
Explain what $effects would have meant - but its not important to this code - in some kind of comment section - if that this important to the overall description of the input file.
The use of "undef" instead of list slice is just fine for a case like this. List slice is great when you want #12, #3, #1, #50-67 in that order. | [reply] [d/l] |