in reply to An overlapping regex capture

You can use split to split $id at every | occurence and use the first one:

my @parts = split '|', $id; # ouch! what trivial error! is: /\|/ see c +horoba below my $filename = $parts[0]; # or simply my $filename = (split '|', $id)[0]; # ouch! what trivial error! is: / +\|/ see choroba below

PS as general rule try to isolate your problem/question as much as you can and then choose a meningful title: An overlapping regex capture does not describe well what you are asking for.. or I'm totally missing the point..

HtH

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: An overlapping regex capture
by choroba (Cardinal) on Jun 21, 2017 at 19:59 UTC
    The first parameter to split is a regex, not a string (except for a space). | in a regex has a special meaning, so you need to backslash it for the literal pipe symbol:
    my @pargs = split /\|/, $id;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re^2: An overlapping regex capture
by Peter Keystrokes (Beadle) on Jun 21, 2017 at 20:20 UTC
    Hmm, but how would I integrate the split function into my loop in order to name the file I create for each sequence?
      > how would I integrate the split function into my loop

      since you have two loops you can or integrate when you grab $id or just before opening the filehandle:

      But, looking more closer to your code, you have many wrong things:

      my %id2seq = (); # this is the verbose form of my %id2seq; my $id = ''; # this is the wrong place to declare this var! declar +e it when you need it ie # inside the while(<File>){ block # missing the mode: put always even if it defaults to '<' open File,"human_hg19_circRNAs_putative_spliced_sequence.fa",or die $! +; # better use lexical filehandle like in open my $fh, '<', $fi +lepath or die # bareword is still accepted but by onvention is UPPERCASE so + no open File... while(<File>){ chomp; # here you are capturing something: if you want just +the part before | you # have here the possibility to get it: /^>([\w\d]+\| +)/ as starting option? if($_ =~ /^>(.+)/){ # or here: $id = $1; # cutting $1 like in: $id = (split /\|/, $1)[0] ... # AHHH! this is error! are your use strict; use warnings ju +st make-up? # it must be foreach my $id .. # (or really does not raise a warn for the scope you given +to $id ??? if so is even # worst!!) # in short: pay attention to the scope of your variables foreach $id (keys %id2seq){ # here the last good possibility to cut $id: # $id = (split /\|/, $id)[0]; if (-f $id){ # this is a lie.. print $id." Already exists, about to override it","\n" } # .. because you are going to append, not to + overwrite open my $out_fh, '>>', "$id.fa" or die $!; # here parens are unneeded and probably nasty print $out_fh (">".$id."\n",$id2seq{$id}, "\n");

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        But the thing is, I want to be able to capture the full fasta sequence title (>hsa_circ_0000001|chr1:1080738-1080845-|None|None) along with the sequence and print that into a file and I want to capture a part of the sequence title and use it to name the file (hsa_circ_0000001) excluding '>'.

        If I use your method surely I will not be able to print the full fasta sequence title followed by a newline and then the sequence in my newly created file, because $id will become (hsa_circ_0000001), whereas I need to be able to capture both (hsa_circ_0000001) and (>hsa_circ_0000001|chr1:1080738-1080845-|None|None) and apply them seamlessly as the new files are created in my loop.

        Pete.

        Okay, I fixed up the items you suggested I should, and so my script is looking like this:
        #!/usr/bin/perl use strict; use warnings; open my $fh, '<',"human_hg19_circRNAs_putative_spliced_sequence.fa",or + die $!; my %id2seq; while(<$fh>){ my $id = ''; chomp; if($_ =~ /^>(.+)/){ $id = $1; }else{ $id2seq{$id} .= $_; } } foreach my $id (keys %id2seq){ my $filename = (split /\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!; print $out_fh ">".$id."\n",$id2seq{$id}, "\n"; close $out_fh; } close $fh;

        How do I integrate the value I've split and extracted into the naming of the file, because it's stating that it's uninitialised?

        Although, I thought that it was clearly initialised/defined here:

        my $filename = (split /\|/, $id)[0]; open my $out_fh, '>>', "$filename" or die $!;

        Or maybe I'm just misunderstanding the scope? Where do I place the $filename in the loop?

        Pete.

        Okay, so I tried to implement that as so:

        foreach $id (keys %id2seq){ my $filename = (split /\|/, $id)[0]; open my $out_fh, '>>', "$filename.fa" or die $!; print $out_fh (">".$id."\n",$id2seq{$id}, "\n"); close $out_fh; }
        But I got an error saying that $filename was uninitialized at the 3rd line down ---  open  my $out_fh, '>>', "$filename.fa" or die $!;

        So I'm just trying to figure out where I should be declaring  $filename

        Pete.

        Thank you for this advice,it's golden, I will try to implement all your points today.

        Pete.