in reply to Compare hash with arrays and print

Using Bio::SeqIO from BioPerl provides a powerful way of dealing with I/O operations on biological data files having the FastA format or any other format for that matter. The input file object is created as well as three output file objects, each one of these objects has information about the format to read from or write into and the file name to read from or direct the output to, in case the output file doesn't exist, it is created for you too.

Using an appropriate BioPerl interface will eliminate the need to construct regexes to detect sequence identifiers and strings, it will also allow you to flexibly migrate among different biological data formats on the go. That will add up to saving time focusing on the data manipulation tasks rather than coding techniques implementation...

#!/usr/local/bin/perl #title "Compare hash with arrays and print" use strict; use warnings; use Bio::SeqIO; my %hash = ( aw1=>10, qs2=>20, dd3=>30, de4=>10, hg5=>30, dfd6=>20, gf4=>20, hgh5=>30, hgy3=>10, ); my $file = "Sample.fa"; my $file10 = "10.fa"; my $file20 = "20.fa"; my $file30 = "30.fa"; my $seq = Bio::SeqIO->new(-file => "<$file", -format=>'fasta'); # inpu +t object #output objects my $seqOut10 = Bio::SeqIO->new(-file => ">$file10", -format=>'fasta'); my $seqOut20 = Bio::SeqIO->new(-file => ">$file20", -format=>'fasta'); my $seqOut30 = Bio::SeqIO->new(-file => ">$file30", -format=>'fasta'); while(my $seqIn = $seq->next_seq()){ for my $key (keys %hash){ if($seqIn->id eq $key && $hash{$key}==10){ $seqOut10->write_seq($seqIn); }elsif($seqIn->id eq $key && $hash{$key} == 20 +){ $seqOut20->write_seq($seqIn); }elsif($seqIn->id eq $key && $hash{$ke +y} == 30){ $seqOut30->write_seq($seqIn); } } }


Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

Replies are listed 'Best First'.
Re^2: Compare hash with arrays and print
by richardwfrancis (Beadle) on Jul 13, 2010 at 14:13 UTC

    I'd just like to second this solution.

    If you are working with sequence data a lot then it's definitely worth investing some time learning the ways of BioPerl.

    In addition to its ability to handle a huge variety of data formats, it provides modules to interact with and run numerous sequence analysis packages such as BLAST, EMBOSS and RepeatMasker

    Learning the potential of BioPerl combined with the Ensembl Perl API has stood me in very good stead over the years.

    Best of luck with your project