Hi there,
I have a script which extracts sequences from a file containing thousands of fasta sequences and creates separate files for each of them.Here is my script:
#!/usr/bin/perl use strict; use warnings; my %id2seq = (); my $id = ''; open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $! +; while(<File>){ chomp; if($_ =~ /^>(.+)/){ $id = $1; }else{ $id2seq{$id} .= $_; } } foreach $id (keys %id2seq){ if (-f $id){ print $id." Already exists, about to override it","\n" } open my $out_fh, '>>', "$id.fa" or die $!; print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); close $out_fh; } close File;
Now, the human_hg19_circRNAs_putative_spliced_sequence.fa file which I am working on contains sequences as such:
>hsa_circ_0000001|chr1:1080738-1080845-|None|None
ATGGGGTTGGGTCAGCCGTGCGGTCAGGTCAGGTCGGCCATGAGGTCAGGTGGGGTCGGCCATGAAGGTGGTGGGGGTCATGAGGTCACAAGGGGGTCGGCCATGTG My script captures each sequence header as the key of a hash and captures the sequence itself as the hash. But the problem is that I want to name the files with only a part of the $id and not the whole of it i.e. hsa_circ_0000001.Is there a simple way to do this? Or do I have to create a new hash to extract filenames?
Pete.
In reply to An overlapping regex capture by Peter Keystrokes
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |