Re: An overlapping regex capture

Replies are listed 'Best First'.
Re^2: An overlapping regex capture by choroba (Cardinal) on Jun 21, 2017 at 19:59 UTC
The first parameter to split is a regex, not a string (except for a space). `\|` in a regex has a special meaning, so you need to backslash it for the literal pipe symbol: `my @pargs = split /\\|/, $id;` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 21, 2017 at 20:20 UTC
Hmm, but how would I integrate the split function into my loop in order to name the file I create for each sequence?	[reply]
Re^3: An overlapping regex capture by Discipulus (Canon) on Jun 21, 2017 at 20:49 UTC
> how would I integrate the split function into my loop since you have two loops you can or integrate when you grab `$id` or just before opening the filehandle: But, looking more closer to your code, you have many wrong things: my %id2seq = (); # this is the verbose form of my %id2seq; my $id = ''; # this is the wrong place to declare this var! declar +e it when you need it ie # inside the while(<File>){ block # missing the mode: put always even if it defaults to '<' open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $! +; # better use lexical filehandle like in open my $fh, '<', $fi +lepath or die # bareword is still accepted but by onvention is UPPERCASE so + no open File... while(<File>){ chomp; # here you are capturing something: if you want just +the part before \| you # have here the possibility to get it: /^>([\w\d]+\\| +)/ as starting option? if($_ =~ /^>(.+)/){ # or here: $id = $1; # cutting $1 like in: $id = (split /\\|/, $1)[0] ... # AHHH! this is error! are your use strict; use warnings ju +st make-up? # it must be foreach my $id .. # (or really does not raise a warn for the scope you given +to $id ??? if so is even # worst!!) # in short: pay attention to the scope of your variables foreach $id (keys %id2seq){ # here the last good possibility to cut $id: # $id = (split /\\|/, $id)[0]; if (-f $id){ # this is a lie.. print $id." Already exists, about to override it","\n" } # .. because you are going to append, not to + overwrite open my $out_fh, '>>', "$id.fa" or die $!; # here parens are unneeded and probably nasty print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:11 UTC
But the thing is, I want to be able to capture the full fasta sequence title (>hsa_circ_0000001\|chr1:1080738-1080845-\|None\|None) along with the sequence and print that into a file and I want to capture a part of the sequence title and use it to name the file (hsa_circ_0000001) excluding '>'. If I use your method surely I will not be able to print the full fasta sequence title followed by a newline and then the sequence in my newly created file, because `$id` will become (hsa_circ_0000001), whereas I need to be able to capture both (hsa_circ_0000001) and (>hsa_circ_0000001\|chr1:1080738-1080845-\|None\|None) and apply them seamlessly as the new files are created in my loop. Pete.	[reply] [d/l]
Re^5: An overlapping regex capture by Discipulus (Canon) on Jun 21, 2017 at 21:21 UTC
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 11:52 UTC
Okay, I fixed up the items you suggested I should, and so my script is looking like this: `#!/usr/bin/perl use strict; use warnings; open my $fh, '<',"human_hg19_circRNAs_putative_spliced_sequence.fa",or + die $!; my %id2seq; while(<$fh>){ my $id = ''; chomp; if($_ =~ /^>(.+)/){ $id = $1; }else{ $id2seq{$id} .= $_; } } foreach my $id (keys %id2seq){ my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!; print $out_fh ">".$id."\n",$id2seq{$id}, "\n"; close $out_fh; } close $fh;` [download] How do I integrate the value I've split and extracted into the naming of the file, because it's stating that it's uninitialised? Although, I thought that it was clearly initialised/defined here: `my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!;` [download] Or maybe I'm just misunderstanding the scope? Where do I place the `$filename` in the loop? Pete.	[reply] [d/l] [select]
Re^5: An overlapping regex capture by poj (Abbot) on Jun 22, 2017 at 15:33 UTC
Re^5: An overlapping regex capture by 1nickt (Canon) on Jun 22, 2017 at 12:01 UTC
Re^6: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 18:17 UTC
Some notes below your chosen depth have not been shown here
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 21, 2017 at 21:47 UTC
Okay, so I tried to implement that as so: `foreach $id (keys %id2seq){ my $filename = (split /\\|/, $id)[0]; open my $out_fh, '>>', "$filename.fa" or die $!; print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); close $out_fh; }` [download] But I got an error saying that `$filename` was uninitialized at the 3rd line down --- `open my $out_fh, '>>', "$filename.fa" or die $!;` So I'm just trying to figure out where I should be declaring `$filename` Pete.	[reply] [d/l] [select]
Re^5: An overlapping regex capture by 1nickt (Canon) on Jun 21, 2017 at 22:11 UTC
Re^5: An overlapping regex capture by hexcoder (Curate) on Jun 22, 2017 at 11:22 UTC
Re^6: An overlapping regex capture by haukex (Archbishop) on Jun 22, 2017 at 12:40 UTC
Re^4: An overlapping regex capture by Peter Keystrokes (Beadle) on Jun 22, 2017 at 09:54 UTC
Thank you for this advice,it's golden, I will try to implement all your points today. Pete.	[reply]