utpalmtbi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks..

I have a input file like ..

>or1|384514307

---ATGAAGAGAATTAACTGGCAACGCTTAGCA ACCATTGGCTTGT

>or2|29377532

ATGAAGAGAATTAACTGGCAACGCTTAGCA

... ..

I want to trim the header line so that the output file become..

>or1

---ATGAAGAGAATTAACTGGCAACGCTTAGCA ACCATTGGCTTGT

>or2

ATGAAGAGAATTAACTGGCAACGCTTAGCA

For this, I use following scripts

while (<>) { if (/^(>\S+)/) { print "$1\n"; } else { print; } }
But it retains the input header ..how can I delete the pipe and the bunch of number after that..

Plz help.. thanks

Replies are listed 'Best First'.
Re: trim header lines in files
by choroba (Cardinal) on Aug 05, 2013 at 13:09 UTC
    The pipe character is not whitespace. You can mention it in the regex, you even do not have to backslash it, because it appears in a character class:
    while (<>) { if (/^(>[^|]*)/) { print "$1\n"; } else { print; } }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: trim header lines in files
by rjt (Curate) on Aug 05, 2013 at 13:07 UTC

    You can just chop off the pipe and anything thereafter before you print:

    while (<>) { s/^>.+?\K\|.*$//; print; }

    If the non-matching lines are quite long but the "or1" tags are short, it will speed things up to specify a quantifier on the substitution (Edit: disregard this optimization if non-matching lines never start with '>'):

        s/^>.{1,10}\K\|.*$//; # Tag is between 1..10 chars
    use strict; use warnings; omitted for brevity.
Re: trim header lines in files
by mtmcc (Hermit) on Aug 05, 2013 at 15:17 UTC
    Or you could split, but it's probably not as efficient:

    #!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0]; my @line; open (my $input, "<", $file) || die "$file not available\n\n"; while (<$input>) { if ($_ =~ /^>/) { @line = split (/\|/, $_); print STDOUT "$line[0]\n"; } else { print STDOUT "$_"; } }