in reply to Substitution on a sequence

Update: Credited the wrong person.

I'd suggest a slight modification to ww's method. As this is a FASTA file, I'd read the file record by record (sequence by sequence), rather than line by line. The following one-liner ought to work, but is untested.

perl -e"BEGIN{$/=qq[\n>]}" -wpe"s[[-*]|DL;][]g" theFile > theOuput

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Substitution on a sequence
by ww (Archbishop) on Jan 29, 2007 at 16:55 UTC

    Tested (albeit, non-rigorously) BrowserUK's with a data file (very modestly varied from OP) eq:

    >DL;H1_ENSP00000194530_chr2_202024 CCCC---GCCTTCTCGCTGCCCAGC--CCCGGGGA +GGGAGG* ">DL;H2_ENSP00000194530_chr2_202024 CCCC---GCCTTCTCGCTGCCCAGC--CCCGGGG +AGGGAGG*" line 2 >DL;H1_ENSP00000194530_chr2_202024 CCCC---GCCTTCDLTCGCTGCCCAGC--CCCGGG +GAGGGAGG* line3 >DL;H1_ENSP00000194530_chr2_202024 CCCC---GCCTTCTCGCTGCCCAGC--CCCGGGGA +GGGAGG line 4 >DL;H1_ENSP00000194530_chr2_202024 CCCC---GCCTTCTCGCTGCCCAGC--CCCGG*GG +AGGGAGG line 5

    and output is:

    >H1_ENSP00000194530_chr2_202024 CCCCGCCTTCTCGCTGCCCAGCCCCGGGGAGGGAGG ">H2_ENSP00000194530_chr2_202024 CCCCGCCTTCTCGCTGCCCAGCCCCGGGGAGGGAGG" + line 2 >H1_ENSP00000194530_chr2_202024 CCCCGCCTTCDLTCGCTGCCCAGCCCCGGGGAGGGAGG + line3 >H1_ENSP00000194530_chr2_202024 CCCCGCCTTCTCGCTGCCCAGCCCCGGGGAGGGAGG l +ine 4 >H1_ENSP00000194530_chr2_202024 CCCCGCCTTCTCGCTGCCCAGCCCCGGGGAGGGAGG l +ine 5

    Nice, BrowserUK; ++

    Update: Fixed the mis-attribution. Give BrowserUK another ++ and I'll do penance in the dungeon; the more so, since it was he who answered a brain_dead question about his code.