First, your DNA sequences are less than 15 character long. I'll assume it is just a mistake in your example, but this may need further clarification.
Assuming that you want to:
- omit lines starting with ">" (headers);
- remove lines more than 30 or less than 15 characters, and
- count the number of occurrences of each individual sequence,
you could do something like this:
This gives me the following result:use strict; use warnings; my %count_seq; while (<DATA>) { chomp; next if /^>/; # discard headers next if length($_) > 30 or length($_) < 15; # discard unwanted siz +es $count_seq{$_}++; # count occurrences } print "$_\t$count_seq{$_}\n" for keys %count_seq; __DATA__ >dfbdbgf_356dfbdf ATGGCTGGATATCGATT >sdgthhr_478364df ATGGCTATGGATCAGATT >dfbdbgf_356dfbdf ATGGCTATCGATT >dfbdbgf_356dfbdg ATGGCTGGATATCGATT >sdgthhr_478364df ATGGCTATGGATCGATT >dfbdbgf_356dfbdg ATGGCTGGATATCGATT >sdgthhr_478364df ATGGCTATGGATCGATT TGCATGCGCTATTAGCG ATGGCTATGGATCGATT TGCATGCGCTATTAGCG ATGGCTATGGATCGATT TGCATGCCCTATTAGCG
$ perl dna_seq.pl TGCATGCGCTATTAGCG 2 ATGGCTATGGATCAGATT 1 ATGGCTGGATATCGATT 3 ATGGCTATGGATCGATT 4 TGCATGCCCTATTAGCG 1
In reply to Re: How to format fasta file
by Laurent_R
in thread How to format fasta file
by andyBio
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |