comment on

Oh Wise Monks,

I have inherited a pipeline process and have found cause to alter it. I am attempting to append to a file that is the output of one step in the pipeline before passing it on to the next step. Sounds easy enough.

However, this component is written in C++ (so I cannot alter it) and each time it dumps its output it adds a header line. So I am getting header line half way through my files. Like so:

% cat output.txt
Chr    Coord    chip_id    subarray_id    gc    NA15510
1    475    5730    5730_1    0.6    0.266
1    505    5730    5730_1    0.63333333    0.422

...     ...     ...     ...             ...     ...

Chr    Coord    chip_id    subarray_id    gc    NA15510

...     ...     ...     ...             ...     ...

1    925    5730    5730_1    0.70666666    0.071
1    960    5730    5730_1    0.70333333    0.036
[download]

Here is a summary of the pipeline code ... with the offending line (me thinks) highlighted:

% cat t.pl
#!/usr/bin/perl -w
use strict;

my $filename = 'output.txt';
my %chr_arms = (
    1 => [ 1, 1000, 2000, 5000 ],
    2 => [ 1, 1000 ],
);

for my $chromosome ( sort { $a <=> $b } keys %chr_arms ) {
    if ( scalar @{ $chr_arms{$chromosome} } >= 2 ) {

        my $arm_start = shift @{ $chr_arms{$chromosome} };
        my $arm_end   = shift @{ $chr_arms{$chromosome} };

        print qq{Running Pipeline for $chromosome:$arm_start-$arm_end\
+n};

        # a long running process (written in C++)
        print qq{ask_bigdb [options] >> $filename\n}; ### OFFENDING LI
+NE!!!

        redo;    # do the next arm of chromosome ...

    }

    # check output file is not empty
    if ( is_too_short($filename) ) {
        print qq{$filename is empty\n};
        next;
    }

    # a few more loooooooong running processes (written in C++ and Jav
+a)
    print qq{GC Normalize ...\n};
    print qq{Table merge ...\n};
    print qq{Median Normalize ...\n};
    print qq{Table merge ...\n};
    print qq{Wave Normalize ...\n};
    print qq{Table merge ...\n\n};

}

exit 0;

# Name      : is_too_short
# Purpose   : check if the file has more that one line in it
sub is_too_short {
    my $file = shift;

    # ensure the file exists
    if ( !-e $file ) {
        die qq{$file does not exist};
    }

    # now check its head-line-count
    if ( my $head = qx{head $file | wc -l} <= 1 ) {
        return 1;
    }
}
[download]

I would like to know if there is a quick and easy way to skip the header line of the output stream of ask_bigdb if appending to an existing file? Note that most of the print are actually system calls to other pipeline components ... they just weren't installed locally =)

Any help is much appreciated.

Smoothie, smoothie, hundre prosent naturlig!

In reply to Skiping the first line of data/output stream by j1n3l0

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.