Re^5: How to match more than 32766 times in regex?

Ok, now I understand. So first off, read about the index command. it will give you positions. So I _think_ you want something like this. Granted, it only takes the same sequence to find duplicated sequences after it. And as long as you do not tell us if gaps are allowed, if you really have 1's and 0's instead of GATC's. Python still seems the better way to go, just read this: http://codereview.stackexchange.com/questions/12522/simple-dna-sequence-finder-w-mismatch-tolerance
Meanwhile, this finds sequences with copies, without allowing gaps:

use strict; 
use warnings; 
use Term::ANSIColor;
use Data::Dumper;

my $X = "100100100010010110110101100100000"; # or use File::Slurp
my $s = "100"; # my pattern $s
my $L = length($s); # length of pattern
my @C; # store colors for later

my $counter = 0;
my $baseposition = 0;
my $newindex = 0;
my $subsequenceposition = 0;

while(($newindex=index($X,$s,$baseposition))>=$baseposition){
  # ok, found something, now checking subsequences
  print "From $baseposition, found '$s' at position $newindex\n";
  push(@C, {pos=>$newindex,length=>$L,color=>'black on_yellow'});
  $subsequenceposition = $newindex + $L;
  print "iterations will start from $subsequenceposition, seeking...\n
+";
  
  while(substr($X,$subsequenceposition,$L) eq $s){
    $counter++; 
    push(@C, {pos=>$subsequenceposition,length=>$L,color=>'black on_gr
+een'});
    print "Found reocurrance at $subsequenceposition ($counter reocurr
+ances found so far)\n";
    $subsequenceposition += $L;
  }
  
  print &colored("Found sequence at $newindex. With $counter reocurran
+ces", 'blue on_white'). "\n";
  
  # now after the last reocurrance, keep searching for our $s
  $baseposition = $subsequenceposition; 
  $counter = 0;
  print "Searching for more starting at $baseposition\n";
}

print "DONE\n";

# now print my sequence with colors
for my $p (sort {$b->{pos} <=> $a->{pos} } @C){
  substr($X, $p->{pos}+$p->{length}, 0) = color('reset');
  substr($X, $p->{pos}, 0) = color($p->{color});
  print $X . "\n";
}
[download]

Comment on Re^5: How to match more than 32766 times in regex? Download Code


Welcome to the Monastery
	PerlMonks