in reply to Re: Compare fasta files
in thread Compare fasta files

I am posting the header of one of the .fasta file:
>167440 TCONS_00167441 scaffold_2269+ 284-1043 AGGGCTCAAGCTTTATTTCACGTAGCTGACTTTACCGTCAGCTCAATTGGAATAGTTTTT CGCTATGTTCGCAGGCAAGTGAGACGATCCATCAATGCCCTTATCTGCTTCGAAAGAACC GGTGTCATCCAAACATGGTGAAGAGGTGGCAACTGGATCAATAATAGCTGAAACTTCTAC TGTACAGGGTTCGGCTTGCCCAACTGTCCAAGCTTGAGATCTATTTTAGAATATGCTTAA CACAACACATGCAATTCGAACGTTGTTTTCTCGGAAAGATTTGAAAGTAACTCCGTTGGG TTCAATGCCCGCTAGTCCCATGCATCCTTTCTGTTGGTCAACAACCAACCACAAGTCAAT CGAATGAATTCTTCAAGACTCCGGACTCTCTTTCTGTCCGGAGGGAATCATTGTTTCTCA ATCAATCATGCCTCAACTGGATAAATTCACTTATTTCACACAATTTTTCTGGTCATGCCT TTTCCTCTTTACTTTTTATATTCCCATATGCAATGATGGAGATGGAGTACTTGGGATCAG

Replies are listed 'Best First'.
Re^3: Compare fasta files
by GotToBTru (Prior) on Nov 20, 2016 at 15:38 UTC

    Your code does not account for the header line, nor would it work on the sequence data since it contains no white space. The replies to How to get non-redundant DNA sequences from a FASTA file? might provide some good insight into how to work with your files. There are packages Bio::Perl and Bio::SeqIO that you might find useful.

    In general, I strongly suggest you use Super Search and search for FASTA. I'd suggest restricting the search to root nodes (there are radio buttons to exclude replies). See what your colleagues have been asking, because questions about FASTA files come pretty regularly here.

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)