in reply to Find number of short words in long word

The =()= 'operator' should do:

my $count =()= 'abracadabra' =~ /br/g; print "Found $count instances\n" # prints Found 2 instances
Big Update:
Based on all the comments below, this might be handy for automatically generating the lookaheads that allow overlapping results based on a search string:
sub countOverlappingMatches { my $stringRef = shift; #Likely to be huge, don't make a copy my $patternRef = shift; my $count =()= $$stringRef =~ /(?=$$patternRef)/g; return $count; } print countOverlappingMatches(\'abrabrabrabra', \'rabra'); # 3
Or extract the one active line that is left in there.
for my $pattern (@listOfPatterns) { my $count =()= $string =~ /(?=$pattern)/g; # Play with $count }

Replies are listed 'Best First'.
Re^2: Find number of short words in long word
by ikegami (Patriarch) on Jul 14, 2009 at 20:42 UTC
    my $count =()= 'AAAAA' =~ /AAAA/g; print "Found $count instances\n"

    It prints 1, but I believe the OP wants 2. More specifically, from ATGCTGTACTG, I believe the OP wants

    ACTG: 1 ATGC: 1 CTGT: 1 GCTG: 1 GTAC: 1 TACT: 1 TGCT: 1 TGTA: 1
Re^2: Find number of short words in long word
by psini (Deacon) on Jul 14, 2009 at 20:43 UTC

    I don't know if it is important for the OP, but your code doesn't find overlapping instances:

    my $count =()= 'abrabrabrabra' =~ /brabra/g; print "Found $count instances\n" # prints Found 2 instances (not 3)

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

      ... and if it's something you need to fix, you can do so easily by packing everything but the first character into a look-ahead:
      my $count =()= 'abrabrabrabra' =~ /b(?=rabra)/g;

      Of course in this case you know that the first possible overlap starts at the second be, so even /bra(?=bra)/ works fine.

      (I haven't benchmarked it, but I suppose that non-look-around literals are a bit faster, due to optimizations regarding the match length).

        No need to split the string-to-be-searched-for up into first character/rest of the characters if the capture group is wrapped in a look-ahead.
        >perl -wMstrict -le "my $string = 'aBrabRabrAbra'; my $pattern = qr{ brabra }xmsi; my $count =()= $string =~ m{ (?= ($pattern)) }xmsg; print $count; my @matches = $string =~ m{ (?= ($pattern)) }xmsg; print qq{@matches}; " 3 BrabRa bRabrA brAbra
      Thanks greatly for the start on this... All *possible* combinations are important - certainly over 20K sequences they'll likely all appear at least a couple of times. Overlapping instances are important - so "AGCTGT" would need to be scored;
      AGCT GCTG CTGT TGTA etc. AGCTGT 1 1 1 0
      and so on...

        Overlapping instances of different patterns would match fine. You'd be searching them separately

        The problem is when the last few characters of a search pattern are the same as the first few characters, and two matches of the same pattern could overlap... it is those cases where you need the lookaheads.


        'ACTACTA' for example; when searching for 'ACTA', should that score two matches or just one?

        If you want it to be two, you need the lookaheads. If you want it to be just one match, then the regex pattern is simply 'ACTA', but it sounds like you want the lookaheads.