Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How can you test to see if a paragraph contains more than one ! consecutively (IE !!, !!!!!!!!, !!!!) and count how many times ! was found consecutively? It shouldn't pick up on just a single ! at any time.
my $paragraph = "This is a cool paragraph! Yes it sure is!!! Isn't i +t neat?!!";
I'm not so good with regular expressions yet. Any advice?

Replies are listed 'Best First'.
Re: Tracking consecutive characters
by Zaxo (Archbishop) on Sep 27, 2004 at 03:29 UTC

    $_ = 'Wow!!!!!'; /(!{2,})/ and print length $1; __END__ 5
    The curlies make a quantifier meaning two or more.

    After Compline,
    Zaxo

      Here are variants that matches multiple times:

      $_ = 'Wow!!!!! simply wow!!!'; @lengths = map { length } /(!{2,})/g;

      or

      $_ = 'Wow!!!!! simply wow!!!'; push(@lengths, length($1)) while (/(!{2,})/g);

        And if you just want to know the maximum string of consecutive !

        $max = ( sort {$b<=>$a} map{length} /(!{2,})/g )[0];
Re: Tracking consecutive characters
by Aristotle (Chancellor) on Sep 27, 2004 at 03:29 UTC

    The naive way is, of course, !!+. There's also a the {MIN,MAX} syntax that lets you explicitly specify a quantity, where either of the numbers the comma and MAX value is optional, that lets you write !{2,}. Since MAX defaults to "infinite" if left out, that will match when there are at least two exclamation marks. MIN defaults to zero, btw, and If you leave out the comma as in .{2}, that sets both MIN and MAX ie it means "exactly this many".

    If you put the entire expression in capturing parentheses, you get to look at the length of the captured string.

    So you probably want something like m/(!{2,})/g.

    Makeshifts last the longest.

      Careful there: only the second number is optional. If you miss out the first number perl will fail to recognise the braces as a quantifier, and so it gets treated as a literal string to match instead:

      zen% perl -wle 'print "miss" if "aa" !~ /^a{,3}$/' miss zen% perl -wle 'print "match" if "aa" =~ /^a{0,3}$/' match zen% perl -wle 'print "match" if "a{,3}" =~ /^a{,3}$/' match zen%

      Hugo

        D'oh! I've never actually used the curlies quantifier that way, so I didn't even know (in fact I've hardly used curlies ever, period). Thanks for the note, I updated my node accordingly.

        Makeshifts last the longest.

      The naive way is, of course, !!+

      It may be the naive way, but sometimes that's the best way (and why not in this case)

        I didn't say it wasn't. :-) In fact, with most regex engines, the star quantifier is optimized better than the plus quantifier which in turn is often optimized better than the curlies quantifier, so it may even be better to use !!!* here if performance is what you need. (Caveat benchmark etc.)

        Makeshifts last the longest.

Re: Tracking consecutive characters
by SciDude (Friar) on Sep 27, 2004 at 03:30 UTC