Re^3: An overlapping regex capture
by Discipulus (Canon) on Jun 21, 2017 at 20:49 UTC
|
> how would I integrate the split function into my loop
since you have two loops you can or integrate when you grab $id or just before opening the filehandle:
But, looking more closer to your code, you have many wrong things:
my %id2seq = (); # this is the verbose form of my %id2seq;
my $id = ''; # this is the wrong place to declare this var! declar
+e it when you need it ie
# inside the while(<File>){ block
# missing the mode: put always even if it defaults to '<'
open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $!
+;
# better use lexical filehandle like in open my $fh, '<', $fi
+lepath or die
# bareword is still accepted but by onvention is UPPERCASE so
+ no open File...
while(<File>){
chomp;
# here you are capturing something: if you want just
+the part before | you
# have here the possibility to get it: /^>([\w\d]+\|
+)/ as starting option?
if($_ =~ /^>(.+)/){
# or here:
$id = $1;
# cutting $1 like in: $id = (split /\|/, $1)[0]
...
# AHHH! this is error! are your use strict; use warnings ju
+st make-up?
# it must be foreach my $id ..
# (or really does not raise a warn for the scope you given
+to $id ??? if so is even
# worst!!)
# in short: pay attention to the scope of your variables
foreach $id (keys %id2seq){
# here the last good possibility to cut $id:
# $id = (split /\|/, $id)[0];
if (-f $id){ # this is a lie..
print $id." Already exists, about to override it","\n"
} # .. because you are going to append, not to
+ overwrite
open my $out_fh, '>>', "$id.fa" or die $!;
# here parens are unneeded and probably nasty
print $out_fh (">".$id."\n",$id2seq{$id}, "\n");
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] [select] |
|
|
But the thing is, I want to be able to capture the full fasta sequence title (>hsa_circ_0000001|chr1:1080738-1080845-|None|None) along with the sequence and print that into a file and I want to capture a part of the sequence title and use it to name the file (hsa_circ_0000001) excluding '>'.
If I use your method surely I will not be able to print the full fasta sequence title followed by a newline and then the sequence in my newly created file, because $id will become (hsa_circ_0000001), whereas I need to be able to capture both (hsa_circ_0000001) and (>hsa_circ_0000001|chr1:1080738-1080845-|None|None) and apply them seamlessly as the new files are created in my loop.
Pete.
| [reply] [d/l] |
|
|
If so create a new variable for the filename, just before creating the file, as in
foreach $id (keys %id2seq){
# here the last good possibility to cut $id:
my $filename = (split /\|/, $id)[0];
...
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] |
|
|
Okay, I fixed up the items you suggested I should, and so my script is looking like this:
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<',"human_hg19_circRNAs_putative_spliced_sequence.fa",or
+ die $!;
my %id2seq;
while(<$fh>){
my $id = '';
chomp;
if($_ =~ /^>(.+)/){
$id = $1;
}else{
$id2seq{$id} .= $_;
}
}
foreach my $id (keys %id2seq){
my $filename = (split /\|/, $id)[0];
open my $out_fh, '>>', "$filename" or die $!;
print $out_fh ">".$id."\n",$id2seq{$id}, "\n";
close $out_fh;
}
close $fh;
How do I integrate the value I've split and extracted into the naming of the file, because it's stating that it's uninitialised?
Although, I thought that it was clearly initialised/defined here:
my $filename = (split /\|/, $id)[0];
open my $out_fh, '>>', "$filename" or die $!;
Or maybe I'm just misunderstanding the scope? Where do I place the $filename in the loop?
Pete. | [reply] [d/l] [select] |
|
|
while(<$fh>){
my $id = '';
chomp;
if ($_ =~ /^>(.+)/){
$id = $1;
} else {
$id2seq{$id} .= $_;
}
}
Try
#!/usr/bin/perl
use strict;
use warnings;
my $id;
my %id2seq;
my $infile = 'human_hg19_circRNAs_putative_spliced_sequence.fa';
open my $fh,'<',$infile
or die "Could not open $infile : $!";
while (<$fh>){
if ( /^>(.+)/ ){
$id = (split /\|/, $1)[0];
}
$id2seq{$id} .= $_;
}
foreach my $id (keys %id2seq){
my $filename = $id.'.fa';
print "Creating $filename\n";
open my $out_fh,'>', $filename
or die "Could not open $filename : $!";
print $out_fh $id2seq{$id};
close $out_fh;
}
close $fh;
poj
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
|
|
|
|
foreach $id (keys %id2seq){
my $filename = (split /\|/, $id)[0];
open my $out_fh, '>>', "$filename.fa" or die $!;
print $out_fh (">".$id."\n",$id2seq{$id}, "\n");
close $out_fh;
}
But I got an error saying that $filename was uninitialized at the 3rd line down --- open my $out_fh, '>>', "$filename.fa" or die $!;
So I'm just trying to figure out where I should be declaring $filename
Pete. | [reply] [d/l] [select] |
|
|
First, follow the advice given. Go through your script and correct *all* the items Discipulus pointed out. Second, when in doubt, print out the value of your variables.
use Data::Dumper;
...
foreach my $id (keys %id2seq) {
warn "ID: $id";
my @segments = split /\|/, $id;
warn "Segments: " . Dumper \@segments;
my $filename = $segments[0];
...
}
Once the code is working right, remove the debug statements.
The way forward always starts with a minimal test.
| [reply] [d/l] |
|
|
No, you got the error saying that $filename.fa is uninitialized.
So perl saw that as a variable name while you intended to only use $filename instead.
In order to delimit the variable name you can use this syntax: "${filename}.fa".
Good luck, hexcoder
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |