Re: comparing two fasta files

Hi nemo2, I am a begginer too, but I want to suggest you some tips that will help you in this forum:

- read the posts regarding the rules in posting, the formmating to tips, etc. If you follow the rules you will have more responses

- second, you should explain your problem thinking that you are talking to computer people, and not to biologists. I believe that most of the people here doesn't know anything about the fasta format, or sequences, or gene names, and they don't care and they don't even have to know anything about biology to help you. So, a fasta header is just a line that starts with ">", and a sequence is just a string, or several lines below that line with the ">" symbol. Reflect the format of the text file in your post, so people know what you are talking about.

And now, it is not a hard job what you have to do, but you need to know some of perl, like:

- reading input files and writing to output files

- using regular expressions to "capture the fasta headers and the corresponding sequences

- using hashes to store the sequence_name and the sequence as pair of key-values and create a lookup table

- and probably some more...

A good book as an introduction of Perl for biologists, and how you can use Perl in your bioinformatic tasks is "Beggining Perl for bioinformatics", James Tisdall, ed. O'Reilly. There you can find what you need to start with perl in bioinformatics, using examples from biology

I hope this can help you

Comment on Re: comparing two fasta files