in reply to Regex KungFu help needed

Rather than using such advanced approaches, you can allow overlapping regular expressions using Look Around Assertions. Specifically, match on the first letter, and require it be followed by the others of interest. You can then use standard techniques for counting matches.

#!/usr/bin/perl use strict; use warnings; my @real_count = (0,0,0,0); my $sequence = "GGGGGGGAGAAAAAAAAAAAAAAAGAAGGA"; my @pattern; $pattern[0] = "A(?=AAAA)"; $pattern[1] = "G(?=GGGG)"; $pattern[2] = "G(?=GAGA)"; $pattern[3] = "G(?=AAGG)"; foreach my $i (0..$#pattern) { $real_count[$i]++ while ($sequence =~ /$pattern[$i]/g); } foreach (@real_count) { print "$_\n"; }

Note I also swapped your error prone for loop for a foreach loop with the range operator.

Replies are listed 'Best First'.
Re^2: Regex KungFu help needed
by johngg (Canon) on Oct 02, 2009 at 15:12 UTC

    You can put the whole term in the look-ahead to make things a bit simpler and you could take advantage of the $scalar = () = $string =~ m{$pattern}g; idiom rather than successive incrementing, wrapping the whole thing in a map.

    $ perl -Mstrict -wle ' > my $seq = q{GGGGGGGAGAAAAAAAAAAAAAAAGAAGGA}; > my @pats = qw{ AAAAA GGGGG GGAGA GAAGG }; > my @cts = map { > my $re = qr{(?=\Q$_\E)}; > my $ct = () = $seq =~ m{$re}g; > } @pats; > print qq{@cts};' 11 3 1 1 $

    I hope this is of interest.

    Cheers,

    JohnGG

      As a further step, associating patterns with their counts and (cached) regex objects in a hash may be worthwhile:
      >perl -wMstrict -le "my $sequence = 'GGGGGGGAGAAAAAAAAAAAAAAAGAAGGA'; my %patterns = map { $_ => { count => 0, regex => qr{ (?= \Q$_\E) }xms } } qw(AAAAA GGGGG GGAGA GAAGG) ; $patterns{$_}{count} =()= $sequence =~ m{ $patterns{$_}{regex} }xmsg for keys %patterns; print qq{$_: $patterns{$_}{count}} for sort keys %patterns; " AAAAA: 11 GAAGG: 1 GGAGA: 1 GGGGG: 3
      or
      >perl -wMstrict -le "my $sequence = 'GGGGGGGAGAAAAAAAAAAAAAAAGAAGGA'; my %patterns = map { $_ => { count => 0, regex => qr{ (?= \Q$_\E) }xms } } qw(AAAAA GGGGG GGAGA GAAGG) ; $_->{count} =()= $sequence =~ m{ $_->{regex} }xmsg for values %patterns; print qq{$_: $patterns{$_}{count}} for sort keys %patterns; " AAAAA: 11 GAAGG: 1 GGAGA: 1 GGGGG: 3

        Nice, ++

        If you don't need to refer to the pattern again you can reduce that to a single step with the uncompiled pattern as key and count as value.

        $ perl -Mstrict -wle ' > my $seq = q{GGGGGGGAGAAAAAAAAAAAAAAAGAAGGA}; > my %pats = map { > $_, scalar( () = $seq =~ m{(?=\Q$_\E)}g ); > } qw{ AAAAA GGGGG GGAGA GAAGG }; > print qq{$_: $pats{ $_ }} for sort keys %pats;' AAAAA: 11 GAAGG: 1 GGAGA: 1 GGGGG: 3 $

        I expect some Monks could golf that down to seven bytes and a nybble :-)

        Cheers,

        JohnGG

        Update: Extra parentheses inside the map were un-necessary, removed!

Re^2: Regex KungFu help needed
by Anonymous Monk on Oct 02, 2009 at 18:25 UTC
    Huh, this is interesting, I will play around with this a bit more. Of course my @patterns are generated on the fly by a little substituion regex would get me what you have done with the (?=...) inside the patterns... Thanks for showing me something totally new.