Re: Find number of short words in long word

The =()= 'operator' should do:

my $count =()= 'abracadabra' =~ /br/g;
print "Found $count instances\n"
# prints Found 2 instances
[download]

Big Update:
Based on all the comments below, this might be handy for automatically generating the lookaheads that allow overlapping results based on a search string:

sub countOverlappingMatches
{
  my $stringRef = shift;  #Likely to be huge, don't make a copy
  my $patternRef = shift;

  my $count =()= $$stringRef =~ /(?=$$patternRef)/g;
  return $count;
}

print countOverlappingMatches(\'abrabrabrabra', \'rabra');  # 3
[download]

Or extract the one active line that is left in there.

for my $pattern (@listOfPatterns)
{
  my $count =()= $string =~ /(?=$pattern)/g;
  # Play with $count
}
[download]

Comment on Re: Find number of short words in long word Select or Download Code

Replies are listed 'Best First'.
Re^2: Find number of short words in long word by ikegami (Patriarch) on Jul 14, 2009 at 20:42 UTC
`my $count =()= 'AAAAA' =~ /AAAA/g; print "Found $count instances\n"` [download] It prints 1, but I believe the OP wants 2. More specifically, from `ATGCTGTACTG`, I believe the OP wants `ACTG: 1 ATGC: 1 CTGT: 1 GCTG: 1 GTAC: 1 TACT: 1 TGCT: 1 TGTA: 1` [download]	[reply] [d/l] [select]
Re^2: Find number of short words in long word by psini (Deacon) on Jul 14, 2009 at 20:43 UTC
I don't know if it is important for the OP, but your code doesn't find overlapping instances: `my $count =()= 'abrabrabrabra' =~ /brabra/g; print "Found $count instances\n" # prints Found 2 instances (not 3)` [download] Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply] [d/l]
Re^3: Find number of short words in long word by moritz (Cardinal) on Jul 14, 2009 at 20:50 UTC
... and if it's something you need to fix, you can do so easily by packing everything but the first character into a look-ahead: `my $count =()= 'abrabrabrabra' =~ /b(?=rabra)/g;` [download] Of course in this case you know that the first possible overlap starts at the second be, so even `/bra(?=bra)/` works fine. (I haven't benchmarked it, but I suppose that non-look-around literals are a bit faster, due to optimizations regarding the match length).	[reply] [d/l] [select]
Re^4: Find number of short words in long word by AnomalousMonk (Archbishop) on Jul 14, 2009 at 23:37 UTC
No need to split the string-to-be-searched-for up into first character/rest of the characters if the capture group is wrapped in a look-ahead. `>perl -wMstrict -le "my $string = 'aBrabRabrAbra'; my $pattern = qr{ brabra }xmsi; my $count =()= $string =~ m{ (?= ($pattern)) }xmsg; print $count; my @matches = $string =~ m{ (?= ($pattern)) }xmsg; print qq{@matches}; " 3 BrabRa bRabrA brAbra` [download]	[reply] [d/l]
Re^3: Find number of short words in long word by sedm1000 (Initiate) on Jul 14, 2009 at 20:54 UTC
Thanks greatly for the start on this... All possible combinations are important - certainly over 20K sequences they'll likely all appear at least a couple of times. Overlapping instances are important - so "AGCTGT" would need to be scored; `AGCT GCTG CTGT TGTA etc. AGCTGT 1 1 1 0` [download] and so on...	[reply] [d/l]
Re^4: Find number of short words in long word by SuicideJunkie (Vicar) on Jul 14, 2009 at 22:50 UTC
Overlapping instances of different patterns would match fine. You'd be searching them separately The problem is when the last few characters of a search pattern are the same as the first few characters, and two matches of the same pattern could overlap... it is those cases where you need the lookaheads. 'ACTACTA' for example; when searching for 'ACTA', should that score two matches or just one? If you want it to be two, you need the lookaheads. If you want it to be just one match, then the regex pattern is simply 'ACTA', but it sounds like you want the lookaheads.	[reply]