in reply to Re^3: Reading files n lines a time
in thread Reading files n lines a time
>nameofsequence\n
ATCGTACGTTGCTE\n
>anothername\n
GTCTGT\n
so that a line starting with > containing a sequence name is followed by a line containing sequences nucleotide information
I am thinking of dredging them in 4 lines a time, because I have reasons to suspect that due to some certain previous operations there might be sequences directly following eachother with different names (on >sequencename\n line) but exactly the same sequence information (on following ATGCTGT\n line). Right now I'm looking to identify and remove such duplicates but I might make use of scripts dealing with many comparision extraction etc. of neighbouring sequences in my files. (Two neigbours means four lines)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Reading files n lines a time
by ww (Archbishop) on Dec 07, 2012 at 19:22 UTC | |
by naturalsciences (Beadle) on Dec 07, 2012 at 19:51 UTC |