Oneliners are great, but not for readability, and when you want people to be able to help you easily, smashing this code into a oneliner is not the best way to present a Short, Self-Contained, Correct Example. Also, Use strict and warnings.

if i 1st assign $+{repeat} to a variable, i get the substring

The Variables related to regular expressions are reset by each successful regex operation and are dynamically scoped, so if you want to use them later, you should generally always store them into other variables, and only use them if the match is successful in the first place. (related recent thread in regards to the variables' scoping: Re: why is $1 cleared at end of an inline sub?)

adding //g back gets back to the non-functional state

As per Regexp Quote Like Operators:

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.

The thing to note here is that pos is per string. You're matching against $s in the while's condition, but then also matching against $s again inside the while's body, each operation using and affecting $s's pos.

As far as I can tell, what your current algorithm is trying to do is count the ocurrences of repeated substrings immediately, each time you find them. This seems quite inefficient.

You've got a few other issues in your code: Your first two examples have "Useless use of hash element in void context" because you just have $+{repeat}; all on its own, and second, $x=int(1000*rand()) and then using $x as an index to @a is going to cause a ton of nonexistent array elements to be picked. Also, random strings are not usually a good idea for testing during development, since tests should be repeatable.

Another issue that I see is that your current regex will consume not only the match (?<repeat>\w{3,}), but also all characters between that match (\w*) and the repetition itself (\g{repeat}), so all of those latter characters won't be checked for repetitions. This can be solved with zero-width Lookaround Assertions, however, you haven't specified what should happen if the sequences overlap and so on. That's why test cases are important.

Anyway, here's a starting point for what I think you might want. Note how I'm simply using a hash to count occurrences.

use warnings; use strict; use Test::More; sub count_reps { my $data = shift; my %seqs; while ( $data =~ m{ (?<repeat>\w{3,}) (?= \w* \g{repeat} ) }xg ) { $seqs{ $+{repeat} }++; } return \%seqs; } is_deeply count_reps('AGCAGC'), { AGC => 1 }; is_deeply count_reps('AATGCAATCGCAGCAGCA'), { AAT => 1, GCA => 3 }; is_deeply count_reps('AGCTACCCAGCTAGGGAGCTA'), { AGCTA => 2 }; done_testing;

Minor edits for clarity.


In reply to Re: count backrefenence regex by haukex
in thread count backrefenence regex by tmolosh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.