Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^5: How to match more than 32766 times in regex?

by FreeBeerReekingMonk (Deacon)
on Dec 01, 2015 at 21:04 UTC ( [id://1149077]=note: print w/replies, xml ) Need Help??


in reply to Re^4: How to match more than 32766 times in regex?
in thread How to match more than 32766 times in regex?

Ok, now I understand. So first off, read about the index command. it will give you positions. So I _think_ you want something like this. Granted, it only takes the same sequence to find duplicated sequences after it. And as long as you do not tell us if gaps are allowed, if you really have 1's and 0's instead of GATC's. Python still seems the better way to go, just read this: http://codereview.stackexchange.com/questions/12522/simple-dna-sequence-finder-w-mismatch-tolerance
Meanwhile, this finds sequences with copies, without allowing gaps:

use strict; use warnings; use Term::ANSIColor; use Data::Dumper; my $X = "100100100010010110110101100100000"; # or use File::Slurp my $s = "100"; # my pattern $s my $L = length($s); # length of pattern my @C; # store colors for later my $counter = 0; my $baseposition = 0; my $newindex = 0; my $subsequenceposition = 0; while(($newindex=index($X,$s,$baseposition))>=$baseposition){ # ok, found something, now checking subsequences print "From $baseposition, found '$s' at position $newindex\n"; push(@C, {pos=>$newindex,length=>$L,color=>'black on_yellow'}); $subsequenceposition = $newindex + $L; print "iterations will start from $subsequenceposition, seeking...\n +"; while(substr($X,$subsequenceposition,$L) eq $s){ $counter++; push(@C, {pos=>$subsequenceposition,length=>$L,color=>'black on_gr +een'}); print "Found reocurrance at $subsequenceposition ($counter reocurr +ances found so far)\n"; $subsequenceposition += $L; } print &colored("Found sequence at $newindex. With $counter reocurran +ces", 'blue on_white'). "\n"; # now after the last reocurrance, keep searching for our $s $baseposition = $subsequenceposition; $counter = 0; print "Searching for more starting at $baseposition\n"; } print "DONE\n"; # now print my sequence with colors for my $p (sort {$b->{pos} <=> $a->{pos} } @C){ substr($X, $p->{pos}+$p->{length}, 0) = color('reset'); substr($X, $p->{pos}, 0) = color($p->{color}); print $X . "\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found