Re: Where is my output stream to a file going?

I suspect that davido is on the right track, re: spaces instead of tabs.

The default split is on any sequence of one or more whitespace characters /\s+/, all 5 of them which include: space,\r,\n,\t,\f. From your data, I see no reason to limit the split to just on \t because you have just plain whitespace separated tokens (no spaces within the desired tokens). Perl is designed to work great with that format! Splitting on a particular type of whitespace (of the total of five that you cannot see on the screen) is usually a bad idea - the default split is usually a good idea unless you have a clear reason why its not. Also note that chomp() is not needed because \n is one of the whitespace characters. But chomp() is fast, so this is a nit.

Perl has an amazing thing, list slice which is not used often enough. There is no need to create an @fields array. Use list slice to get what you want, ie: my $ccds_status = (split)[5];. One of the good things about this is that we've documented what the heck field 5 is! And know we can refer to it as $ccds_status instead of #5. In larger programs, this is a significant advantage.

Another variant of "stuff we can't see" happens when there is say a blank line in the file that we can't see easily. When I did the cut and paste, I wound up with a trailing line with one space in it. There are a number of ways of dealing with this. Often my code will have: next if /^\s+$/;, meaning skip blank lines. Below, another way, I checked if $ccds_status was defined to prevent an error message when that trailing blank line is encountered.

Also note that $ccds_status eq 'Withdrawn' instead of a regex would be ok also. When dealing with files generated by other computer programs, allowing for case often is not necessary - but that's also a minor nit.

My two main points are:
1. Don't over restrict the whitespace split.
2. Use list slice to get variables into human readable names instead of using field numbers further in the program.

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $firstline= <DATA>;  #actually optional in this case
while(<DATA>)
{
    my $ccds_status = (split)[5];
    
    #see discussion re: defined()
    if(defined($ccds_status) and $ccds_status =~ m/Withdrawn/)
    { 
        print "$_\n";    # lines with Withdrawn 
    }
}
#prints:
#1   NC_000001.8 NCRNA00115  79854   CCDS1.1 Withdrawn   -   801942  8
+02433
 
__DATA__
#chromosome nc_accession    gene    gene_id ccds_id ccds_status cds_st
+rand  cds_from    cds_to
1   NC_000001.8 NCRNA00115  79854   CCDS1.1 Withdrawn   -   801942  80
+2433
1   NC_000001.10    SAMD11  148398  CCDS2.2 Public  +   861321  879532
1   NC_000001.10    NOC2L   26155   CCDS3.1 Public  -   880073  894619
1   NC_000001.10    PLEKHN1 84069   CCDS4.1 Public  +   901911  909954
1   NC_000001.10    HES4    57801   CCDS5.1 Public  -   934438  935352
1   NC_000001.10    ISG15   9636    CCDS6.1 Public  +   948953  949857
1   NC_000001.10    C1orf159    54991   CCDS7.2 Public  -   1018272 10
+26922
1   NC_000001.10    TTLL10  254173  CCDS8.1 Public  +   1115433 112052
+1
1   NC_000001.10    TNFRSF18    8784    CCDS9.1 Public  -   1138970 11
+41950
1   NC_000001.10    TNFRSF18    8784    CCDS10.1    Public  -   113922
+3 1141950
1   NC_000001.10    TNFRSF4 7293    CCDS11.1    Public  -   1146934 11
+49506
[download]

Comment on Re: Where is my output stream to a file going? Select or Download Code