in reply to Where is my output stream to a file going?
The default split is on any sequence of one or more whitespace characters /\s+/, all 5 of them which include: space,\r,\n,\t,\f. From your data, I see no reason to limit the split to just on \t because you have just plain whitespace separated tokens (no spaces within the desired tokens). Perl is designed to work great with that format! Splitting on a particular type of whitespace (of the total of five that you cannot see on the screen) is usually a bad idea - the default split is usually a good idea unless you have a clear reason why its not. Also note that chomp() is not needed because \n is one of the whitespace characters. But chomp() is fast, so this is a nit.
Perl has an amazing thing, list slice which is not used often enough. There is no need to create an @fields array. Use list slice to get what you want, ie: my $ccds_status = (split)[5];. One of the good things about this is that we've documented what the heck field 5 is! And know we can refer to it as $ccds_status instead of #5. In larger programs, this is a significant advantage.
Another variant of "stuff we can't see" happens when there is say a blank line in the file that we can't see easily. When I did the cut and paste, I wound up with a trailing line with one space in it. There are a number of ways of dealing with this. Often my code will have: next if /^\s+$/;, meaning skip blank lines. Below, another way, I checked if $ccds_status was defined to prevent an error message when that trailing blank line is encountered.
Also note that $ccds_status eq 'Withdrawn' instead of a regex would be ok also. When dealing with files generated by other computer programs, allowing for case often is not necessary - but that's also a minor nit.
My two main points are:
1. Don't over restrict the whitespace split.
2. Use list slice to get variables into human readable names instead of using field numbers further in the program.
#!/usr/bin/perl -w use strict; use Data::Dumper; my $firstline= <DATA>; #actually optional in this case while(<DATA>) { my $ccds_status = (split)[5]; #see discussion re: defined() if(defined($ccds_status) and $ccds_status =~ m/Withdrawn/) { print "$_\n"; # lines with Withdrawn } } #prints: #1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 8 +02433 __DATA__ #chromosome nc_accession gene gene_id ccds_id ccds_status cds_st +rand cds_from cds_to 1 NC_000001.8 NCRNA00115 79854 CCDS1.1 Withdrawn - 801942 80 +2433 1 NC_000001.10 SAMD11 148398 CCDS2.2 Public + 861321 879532 1 NC_000001.10 NOC2L 26155 CCDS3.1 Public - 880073 894619 1 NC_000001.10 PLEKHN1 84069 CCDS4.1 Public + 901911 909954 1 NC_000001.10 HES4 57801 CCDS5.1 Public - 934438 935352 1 NC_000001.10 ISG15 9636 CCDS6.1 Public + 948953 949857 1 NC_000001.10 C1orf159 54991 CCDS7.2 Public - 1018272 10 +26922 1 NC_000001.10 TTLL10 254173 CCDS8.1 Public + 1115433 112052 +1 1 NC_000001.10 TNFRSF18 8784 CCDS9.1 Public - 1138970 11 +41950 1 NC_000001.10 TNFRSF18 8784 CCDS10.1 Public - 113922 +3 1141950 1 NC_000001.10 TNFRSF4 7293 CCDS11.1 Public - 1146934 11 +49506
|
|---|