umeboshi has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I need to perform a search for all strings of lengths, say, L = [8 ... 13] characters. There are only 4 characters which can be used to make up the strings, say, [a b c d].

In other words, for a size L I need to count the matches for each of the permutations (obviously with repetition). I.e. I need to count the matches for each of the 4^L permutations.

I know there is something like Algorithm::Permute out there but I don't think it solves my problem.

Thanks for any hints.

Hadassa

Replies are listed 'Best First'.
Re: Permutation with repetitions
by moritz (Cardinal) on Jun 07, 2010 at 09:35 UTC
    I'm not sure I've understood correctly what you want to do, but if you want to count matches of a certain characteristic in a given string, this snippet might help you:
    my %matches; while ($str =~ /([abcd]{8,13})/g) { $matches{$1}++; } use Data::Dumper; print Dumper \%matches;
    Perl 6 - links to (nearly) everything that is Perl 6.
      Hi,

      Thanks for replying. I don't think your code is what I am looking for. I'll try to explain again:

      For the sake of illustration let's say the size of the substring I am looking for is 3. Ideally, what I would like in the end is a list for all possible permutations with repetitions and a count how often they occur. For this example, the list would look something like that:

      aaa 0
      aab 2
      aac 1
      aad 0
      aba 2
      abb 0
      abc 1
      abd 0
      etc.
      How can this be achieved?
      Thanks
      Hadassa
        Ideally, what I would like in the end is a list for all possible permutations with repetitions and a count how often they occur.

        How often they occur... where? in which set/string/whatever?

        If you mean in the list of possible permutations, why are some of them 0?

        Perl 6 - links to (nearly) everything that is Perl 6.
Re: Permutation with repetitions
by Sinistral (Monsignor) on Jun 07, 2010 at 13:53 UTC
      Yes, indeed, I am dealing with DNA sequence. However, I could not find a suitable tool in BioPerl (I might have missed it, since this seems a pretty straight forward problem I have). Thanks for pointing this out.
Re: Permutation with repetitions
by umeboshi (Acolyte) on Jun 08, 2010 at 12:26 UTC
    Thanks everyone for their comments. They were very helpful

    An additional complexity to the problem is that I have some characters, say "x", in the string which are not to be counted, i.e.

    aaaabccbaxxxcbddadcc...

    So if n=3 as in the example any substring of size 3 that includes an x needs to be ignored and not counted.

    Any ideas on that?
    Thanks again!

      Presumably, you don't want to count substrings that would be formed by the removal of the Xs?

      Also, you mentioned earlier having all the permutations, including those with zero occurances, in the results hash. The problem with that is that by the time you get to n=13, you'd have 67 million entries in the hash, most of which would be 0.

      Anyway, this might be of some use:

      $s = 'aaaabccbaxxxcbddadcc';; ++$h{ $1 } while $s =~ m[(?=([abcd]{3})).]g;; pp \%h;; { aaa => 2, aab => 1, abc => 1, adc => 1, bcc => 1, bdd => 1, cba => 1, cbd => 1, ccb => 1, dad => 1, dcc => 1, dda => 1, }

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Yes, forgot to mention that of course I will not store the one which have count zero.

        Your answer was perfect! I guess I will have to catch up on some regular expressions!


        Great!
        Thanks a lot!
        Very interesting solution. May I ask why you put two ;; at the end of each line and what pp in pp \%h" does ?
      next if $substr =~ /x/;
      Perl 6 - links to (nearly) everything that is Perl 6.
Re: Permutation with repetitions
by JavaFan (Canon) on Jun 07, 2010 at 14:07 UTC
    my $L = 4; my @chars = qw[a b c d]; $_ = "aaaabdbbdaccdacdaaabdcd"; my %seen = map {$_, 0} glob do {local $" = ","; "{@chars}" x $L}; {local $" = ""; use re 'eval'; /([@chars]{$L})(?{$seen{$1}++})(*FAIL)/ +;} while (my ($n, $v) = each %seen) { say "$n -> $v"; }
Re: Permutation with repetitions
by umeboshi (Acolyte) on Jul 07, 2010 at 14:17 UTC

    In my previous question in this thread I asked to count the number of occurrences of all permutations with repetitions. What if I only wanted to not have the count but a binary answer (i.e. 0 or 1) if a specific permutation is in my string? There must be a way to just change the regexp in a minor way.

    Up until now I used the suggested solution of:
    $n=4; $str="abababbcad"; %count; while($str =~ m[(?=([abcd]{$n})).]g){ ++$count{ $1}; }
    I know I could set
    $count{$1}=1
    but this is not desirable since I would like to add to the hash the counts of unique occurrences of other strings. For example:
    $str1="abad"; $str2="aaab"; $n=2;
    should give the following list:
    aa 1
    ab 2
    ad 1
    ba 1
    Thanks!