Peter Keystrokes has asked for the wisdom of the Perl Monks concerning the following question:
Hi there,
I have a script which extracts sequences from a file containing thousands of fasta sequences and creates separate files for each of them.Here is my script:
#!/usr/bin/perl use strict; use warnings; my %id2seq = (); my $id = ''; open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $! +; while(<File>){ chomp; if($_ =~ /^>(.+)/){ $id = $1; }else{ $id2seq{$id} .= $_; } } foreach $id (keys %id2seq){ if (-f $id){ print $id." Already exists, about to override it","\n" } open my $out_fh, '>>', "$id.fa" or die $!; print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); close $out_fh; } close File;
Now, the human_hg19_circRNAs_putative_spliced_sequence.fa file which I am working on contains sequences as such:
>hsa_circ_0000001|chr1:1080738-1080845-|None|None
ATGGGGTTGGGTCAGCCGTGCGGTCAGGTCAGGTCGGCCATGAGGTCAGGTGGGGTCGGCCATGAAGGTGGTGGGGGTCATGAGGTCACAAGGGGGTCGGCCATGTG My script captures each sequence header as the key of a hash and captures the sequence itself as the hash. But the problem is that I want to name the files with only a part of the $id and not the whole of it i.e. hsa_circ_0000001.Is there a simple way to do this? Or do I have to create a new hash to extract filenames?
Pete.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: An overlapping regex capture
by Discipulus (Canon) on Jun 21, 2017 at 19:27 UTC | |
by choroba (Cardinal) on Jun 21, 2017 at 19:59 UTC | |
by Peter Keystrokes (Beadle) on Jun 21, 2017 at 20:20 UTC | |
by Discipulus (Canon) on Jun 21, 2017 at 20:49 UTC | |
by Peter Keystrokes (Beadle) on Jun 22, 2017 at 11:52 UTC | |
by poj (Abbot) on Jun 22, 2017 at 15:33 UTC | |
by 1nickt (Canon) on Jun 22, 2017 at 12:01 UTC | |
| |
by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:11 UTC | |
by Discipulus (Canon) on Jun 21, 2017 at 21:21 UTC | |
by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:47 UTC | |
by 1nickt (Canon) on Jun 21, 2017 at 22:11 UTC | |
by hexcoder (Curate) on Jun 22, 2017 at 11:22 UTC | |
| |
by Peter Keystrokes (Beadle) on Jun 22, 2017 at 09:54 UTC | |
|
Re: An overlapping regex capture
by kcott (Archbishop) on Jun 22, 2017 at 06:25 UTC | |
by Peter Keystrokes (Beadle) on Jun 22, 2017 at 09:50 UTC |