j1n3l0 has asked for the wisdom of the Perl Monks concerning the following question:

Oh Wise Monks,

I have inherited a pipeline process and have found cause to alter it. I am attempting to append to a file that is the output of one step in the pipeline before passing it on to the next step. Sounds easy enough.

However, this component is written in C++ (so I cannot alter it) and each time it dumps its output it adds a header line. So I am getting header line half way through my files. Like so:

% cat output.txt Chr Coord chip_id subarray_id gc NA15510 1 475 5730 5730_1 0.6 0.266 1 505 5730 5730_1 0.63333333 0.422 ... ... ... ... ... ... Chr Coord chip_id subarray_id gc NA15510 ... ... ... ... ... ... 1 925 5730 5730_1 0.70666666 0.071 1 960 5730 5730_1 0.70333333 0.036

Here is a summary of the pipeline code ... with the offending line (me thinks) highlighted:

% cat t.pl #!/usr/bin/perl -w use strict; my $filename = 'output.txt'; my %chr_arms = ( 1 => [ 1, 1000, 2000, 5000 ], 2 => [ 1, 1000 ], ); for my $chromosome ( sort { $a <=> $b } keys %chr_arms ) { if ( scalar @{ $chr_arms{$chromosome} } >= 2 ) { my $arm_start = shift @{ $chr_arms{$chromosome} }; my $arm_end = shift @{ $chr_arms{$chromosome} }; print qq{Running Pipeline for $chromosome:$arm_start-$arm_end\ +n}; # a long running process (written in C++) print qq{ask_bigdb [options] >> $filename\n}; ### OFFENDING LI +NE!!! redo; # do the next arm of chromosome ... } # check output file is not empty if ( is_too_short($filename) ) { print qq{$filename is empty\n}; next; } # a few more loooooooong running processes (written in C++ and Jav +a) print qq{GC Normalize ...\n}; print qq{Table merge ...\n}; print qq{Median Normalize ...\n}; print qq{Table merge ...\n}; print qq{Wave Normalize ...\n}; print qq{Table merge ...\n\n}; } exit 0; # Name : is_too_short # Purpose : check if the file has more that one line in it sub is_too_short { my $file = shift; # ensure the file exists if ( !-e $file ) { die qq{$file does not exist}; } # now check its head-line-count if ( my $head = qx{head $file | wc -l} <= 1 ) { return 1; } }

I would like to know if there is a quick and easy way to skip the header line of the output stream of ask_bigdb if appending to an existing file? Note that most of the print are actually system calls to other pipeline components ... they just weren't installed locally =)

Any help is much appreciated.


Smoothie, smoothie, hundre prosent naturlig!

Replies are listed 'Best First'.
Re: Skiping the first line of data/output stream
by shmem (Chancellor) on Dec 14, 2007 at 11:53 UTC
    print qq{ask_bigdb [options] >> $filename\n}; ### OFFENDING LINE!!!

    Suggestion:

    my @lines = ask_bigdb(@options); # whatever that ask_bigdb is... shift @lines; # get rid of header open my $outfh, '>>', $filename # append to file or die "Can't append to '$filename': $!\n"; print $outfh @lines; close $outfh or die "Can't close '$filename': $!\n";
    Then,
    redo; # do the next arm of chromosome ...

    here's a mismatch between code and comment. Are you sure you want redo her, and not next?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Skiping the first line of data/output stream
by poolpi (Hermit) on Dec 14, 2007 at 11:40 UTC
Re: Skiping the first line of data/output stream
by j1n3l0 (Friar) on Dec 14, 2007 at 12:21 UTC
    Thanks for that. Found another solution (from a colleague) which I have managed to implement. I've just created a few constants and changed the offending line as follows:

    % cat t.pl #!/usr/bin/perl -w use strict; ... my $PIPE_STRING = q{| tail -n +2}; # starts from line 2 of input my $EMPTY_STRING = q{}; for my $chromosome ( sort { $a <=> $b } keys %chromosome_arms ) { if ( scalar @{ $chromosome_arms{$chromosome} } >= 2 ) { ... # a long running process (written in C++) my $format = qq{ask_bigdb [options] %s >> $filename\n}; # generate the command my $command = -e $filename # if file exists ? sprintf $format, $PIPE_STRING # use the tail pipe : sprintf $format, $EMPTY_STRING; # use the empty string # run the command here print $command; ... } }

    Yes I do need a redo there. Otherwise I would not get the second arm of my chromosomes. And this solution also does not require me holding much in memory. (At least I think not!)

    Thanks again.


    Smoothie, smoothie, hundre prosent naturlig!
Re: Skiping the first line of data/output stream
by HeatSeekerCannibal (Beadle) on Dec 14, 2007 at 19:29 UTC
    You could also have used a regexp to identify the lines you dont want and simply skip them when the input line matches the regexp.

    Best of luck!

    Heatseeker Cannibal