trim header lines in files

utpalmtbi has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks..

I have a input file like ..

>or1|384514307

---ATGAAGAGAATTAACTGGCAACGCTTAGCA ACCATTGGCTTGT

>or2|29377532

ATGAAGAGAATTAACTGGCAACGCTTAGCA

... ..

I want to trim the header line so that the output file become..

>or1

---ATGAAGAGAATTAACTGGCAACGCTTAGCA ACCATTGGCTTGT

>or2

ATGAAGAGAATTAACTGGCAACGCTTAGCA

For this, I use following scripts

while (<>) {
  if (/^(>\S+)/) {
    print "$1\n";
  } else {
    print;
  }
}
[download]

But it retains the input header ..how can I delete the pipe and the bunch of number after that..

Plz help.. thanks

Comment on trim header lines in files Download Code

Replies are listed 'Best First'.
Re: trim header lines in files by choroba (Cardinal) on Aug 05, 2013 at 13:09 UTC
The pipe character is not whitespace. You can mention it in the regex, you even do not have to backslash it, because it appears in a character class: `while (<>) { if (/^(>[^\|]*)/) { print "$1\n"; } else { print; } }` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re: trim header lines in files by rjt (Curate) on Aug 05, 2013 at 13:07 UTC
You can just chop off the pipe and anything thereafter before you print: `while (<>) { s/^>.+?\K\\|.$//; print; }` [download] If the non-matching lines are quite long but the "or1" tags are short, it will speed things up to specify a quantifier on the substitution (Edit:* disregard this optimization if non-matching lines never start with `'>'`): `s/^>.{1,10}\K\\|.*$//; # Tag is between 1..10 chars` `use strict; use warnings;` omitted for brevity.	[reply] [d/l] [select]
Re: trim header lines in files by mtmcc (Hermit) on Aug 05, 2013 at 15:17 UTC
Or you could split, but it's probably not as efficient: `#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0]; my @line; open (my $input, "<", $file) \|\| die "$file not available\n\n"; while (<$input>) { if ($_ =~ /^>/) { @line = split (/\\|/, $_); print STDOUT "$line[0]\n"; } else { print STDOUT "$_"; } }` [download]	[reply] [d/l]