Re^3: An overlapping regex capture

> how would I integrate the split function into my loop

since you have two loops you can or integrate when you grab $id or just before opening the filehandle:

But, looking more closer to your code, you have many wrong things:

my %id2seq = (); # this is the verbose form of my %id2seq;
my $id = '';     # this is the wrong place to declare this var! declar
+e it when you need it ie
                 # inside the while(<File>){ block
         # missing the mode: put always even if it defaults to '<'
open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $!
+;
         # better use lexical filehandle like in open my $fh, '<', $fi
+lepath or die
         # bareword is still accepted but by onvention is UPPERCASE so
+ no open File...

    while(<File>){
        chomp;
                 # here you are capturing something: if you want just 
+the part before | you
                 # have here the possibility to get it:  /^>([\w\d]+\|
+)/ as starting option?
        if($_ =~ /^>(.+)/){
                 # or here:
                 $id = $1;
                 # cutting $1 like in: $id = (split /\|/, $1)[0]
...
           # AHHH! this is error! are your use strict; use warnings ju
+st make-up?
           # it must be foreach my $id ..
           # (or really does not raise a warn for the scope you given 
+to $id ??? if so is even
           #  worst!!)
           # in short: pay attention to the scope of your variables
   foreach $id (keys %id2seq){
                 # here the last good possibility to cut $id:
                 # $id = (split /\|/, $id)[0];
        if (-f $id){                             # this is a lie..
            print $id." Already exists, about to override it","\n"
        }                 # .. because you are going to append, not to
+ overwrite
        open  my $out_fh, '>>', "$id.fa" or die $!; 
                     # here parens are unneeded and probably nasty
        print $out_fh (">".$id."\n",$id2seq{$id}, "\n");
[download]

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Comment on Re^3: An overlapping regex capture Select or Download Code

Replies are listed 'Best First'.
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 11:52 UTC
Okay, I fixed up the items you suggested I should, and so my script is looking like this: `#!/usr/bin/perl use strict; use warnings; open my $fh, '<',"human_hg19_circRNAs_putative_spliced_sequence.fa",or + die $!; my %id2seq; while(<$fh>){ my $id = ''; chomp; if($_ =~ /^>(.+)/){ $id = $1; }else{ $id2seq{$id} .= $_; } } foreach my $id (keys %id2seq){ my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!; print $out_fh ">".$id."\n",$id2seq{$id}, "\n"; close $out_fh; } close $fh;` [download] How do I integrate the value I've split and extracted into the naming of the file, because it's stating that it's uninitialised? Although, I thought that it was clearly initialised/defined here: `my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!;` [download] Or maybe I'm just misunderstanding the scope? Where do I place the `$filename` in the loop? Pete.	[reply] [d/l] [select]
Re^5: An overlapping regex capture by poj (Abbot) on Jun 22, 2017 at 15:33 UTC
The problem is earlier where `$id` is set for the `>` lines but then cleared on the subsequent sequence lines `while(<$fh>){ my $id = ''; chomp; if ($_ =~ /^>(.+)/){ $id = $1; } else { $id2seq{$id} .= $_; } }` [download] Try `#!/usr/bin/perl use strict; use warnings; my $id; my %id2seq; my $infile = 'human_hg19_circRNAs_putative_spliced_sequence.fa'; open my $fh,'<',$infile or die "Could not open $infile : $!"; while (<$fh>){ if ( /^>(.+)/ ){ $id = (split /\\|/, $1)[0]; } $id2seq{$id} .= $_; } foreach my $id (keys %id2seq){ my $filename = $id.'.fa'; print "Creating $filename\n"; open my $out_fh,'>', $filename or die "Could not open $filename : $!"; print $out_fh $id2seq{$id}; close $out_fh; } close $fh;` [download] poj	[reply] [d/l] [select]
Re^5: An overlapping regex capture by 1nickt (Canon) on Jun 22, 2017 at 12:01 UTC
Did you try printing the values of your variables as I suggested? The way forward always starts with a minimal test.	[reply]
Re^6: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 18:17 UTC
Yes, I tried it and got the following: ID: at seqextractor.pl line 23, <$fh> line 130. Segments: $VAR1 = []; From the using the following `foreach my $id (keys %id2seq){ warn "ID: $id"; my @segments = split /\\|/, $id; warn "Segments: " . Dumper \@segments; my $filename = $segments[0]; }` [download] Does it mean that the array is empty?	[reply] [d/l]
Re^7: An overlapping regex capture by 1nickt (Canon) on Jun 22, 2017 at 21:24 UTC
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:11 UTC
But the thing is, I want to be able to capture the full fasta sequence title (>hsa_circ_0000001\|chr1:1080738-1080845-\|None\|None) along with the sequence and print that into a file and I want to capture a part of the sequence title and use it to name the file (hsa_circ_0000001) excluding '>'. If I use your method surely I will not be able to print the full fasta sequence title followed by a newline and then the sequence in my newly created file, because `$id` will become (hsa_circ_0000001), whereas I need to be able to capture both (hsa_circ_0000001) and (>hsa_circ_0000001\|chr1:1080738-1080845-\|None\|None) and apply them seamlessly as the new files are created in my loop. Pete.	[reply] [d/l]
Re^5: An overlapping regex capture by Discipulus (Canon) on Jun 21, 2017 at 21:21 UTC
If so create a new variable for the filename, just before creating the file, as in `foreach $id (keys %id2seq){ # here the last good possibility to cut $id: my $filename = (split /\\|/, $id)[0]; ...` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 09:54 UTC
Thank you for this advice,it's golden, I will try to implement all your points today. Pete.	[reply]
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:47 UTC
Okay, so I tried to implement that as so: `foreach $id (keys %id2seq){ my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename.fa" or die $!; print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); close $out_fh; }` [download] But I got an error saying that `$filename` was uninitialized at the 3rd line down --- `open my $out_fh, '>>', "$filename.fa" or die $!;` So I'm just trying to figure out where I should be declaring `$filename` Pete.	[reply] [d/l] [select]
Re^5: An overlapping regex capture by 1nickt (Canon) on Jun 21, 2017 at 22:11 UTC
First, follow the advice given. Go through your script and correct all the items Discipulus pointed out. Second, when in doubt, print out the value of your variables. `use Data::Dumper; ... foreach my $id (keys %id2seq) { warn "ID: $id"; my @segments = split /\\|/, $id; warn "Segments: " . Dumper \@segments; my $filename = $segments[0]; ... }` [download] Once the code is working right, remove the debug statements. The way forward always starts with a minimal test.	[reply] [d/l]
Re^5: An overlapping regex capture by hexcoder (Curate) on Jun 22, 2017 at 11:22 UTC
No, you got the error saying that `$filename`.fa is uninitialized. So perl saw that as a variable name while you intended to only use `$filename` instead. In order to delimit the variable name you can use this syntax: `"${filename}.fa"`. Good luck, hexcoder	[reply] [d/l] [select]
Re^6: An overlapping regex capture by haukex (Archbishop) on Jun 22, 2017 at 12:40 UTC
No, you got the error saying that $filename.fa is uninitialized. So perl saw that as a variable name ... Sorry, but that's not correct in this case, a dot does end the variable name being interpolated. As the OP is using strict, that would have caught the error anyway. Your advice does apply for other characters though. `$ perl -wMstrict -le 'my $fn="x"; print "$fn.y"' x.y $ perl -wMstrict -le 'my $fn="x"; print "$fn_y"' Global symbol "$fn_y" requires explicit package name (did you forget t +o declare "my $fn_y"?) at -e line 1. Execution of -e aborted due to compilation errors. $ perl -w -le 'my $fn="x"; print "$fn_y"' Name "main::fn_y" used only once: possible typo at -e line 1. Use of uninitialized value $fn_y in string at -e line 1. $ perl -wMstrict -le 'my $fn="x"; print "${fn}_y"' x_y` [download]	[reply] [d/l]