ramprasad_gk has asked for the wisdom of the Perl Monks concerning the following question:

can anyone tell me how we can match repetative elements in string using perl regexp.. Example--- string => "It was an intelligent stori" Here I want to match I as in It 2 i's as in intelligent i as in stori============ any thoughts??? Thanks in advance...

Replies are listed 'Best First'.
Re: regexp to match repeated charater
by SuicideJunkie (Vicar) on Aug 28, 2007 at 13:33 UTC

    Use brackets to memorize an chunk, and then \1, \2, etc to refer to the memorized chunk inside the regex ($1, $2 outside it)

    For example, to search for words which have the same letter twice;

    my $string = <>; $string =~ /(\w*(\w)\w*\2\w*)/; # The beginning of the word, then a notable letter, # maybe some middle letters, then that same notable letter as before, # then the rest of the word. print "Found word '$1', with duplicated letter '$2'\n";
    This generates:
    >image imagination dream Found word 'imagination', with duplicated letter 'n' >

    If you want to find the duplicated letter which appears first rather than last, change the first \w* to the non-greedy form \w*?. It should be easy to extend this principle to whole sentences/strings.

Re: regexp to match repeated charater
by grinder (Bishop) on Aug 28, 2007 at 13:41 UTC

    If I have understood the question correctly, you're looking for the spaceship goats.cx operator.

    #! /usr/bin/perl use strict; use warnings; my $str = 'It was an intelligent stori'; for my $word (split / /, $str) { my $nr =()= $word =~ /(i)/gi; if ($nr > 0) { print "$word has $nr $1\n"; } }

    (No, there's not really such an operator, it's more an emergent effect of forcing a global match's list context into scalar context by assigning the result to (), the empty list).

    • another intruder with the mooring in the heart of the Perl

      Your first sentence confused me. The "spaceship" operator is <=> (numeric comparison operator). I couldn't figure out what you were going to do with that. Then, after glancing at the code, reading your note at the bottom, wondering "What do you mean, there really isn't a spaceship operaton?", looking back at the code more closely, I saw you meant =()= to be the "spaceship" operator. Same name, different usage.

      Ivan Heffner
      Sr. Software Engineer
      WhitePages.com, Inc.

        Oops. You are quite right. I did of course mean the infamous goats.cx operator. I leave it to your imagination to figure out why =()= was so named. I shall amend the parent node accordingly. Thanks.

        • another intruder with the mooring in the heart of the Perl

Re: regexp to match repeated charater
by Anno (Deacon) on Aug 28, 2007 at 13:57 UTC
    This captures all alphabetic characters that appear more than once in a string, ignoring case:
    $_ = 'It was an intelligent stori'; my @multiple = /([[:alpha:]])(?=.*\1)/ig;
    Anno
Re: regexp to match repeated charater
by swampyankee (Parson) on Aug 28, 2007 at 14:33 UTC

    When you refer to "repetitive elements," do you mean a single character occurring more than once in a string or to a more complex entity, such as multiple instances of a substring, e.g., in the string "It was the best of times, it was the worst of times," (Dickens, Tale of Two Cities), would you be interested in the number of occurrences of the letter "t," the substring "st," the word "was," the fragment "it was," or all four? Do you care about case? I'm presuming you are not including whitespace; are you including punctuation?

    What have you tried to do? Do you have any code? Test cases? Results?


    Addendum:

    I can see from your title that you were looking for single characters occurring more than once in a string, not groups of characters. I'm still ambiguous about whether your definition of "repeated character" includes case ("I" and "i" are distinct). Also, a minor quibble: frequently in English usage, "repeated character" means multiple instances of a given character occur together, i.e., in the string "abbcdefga," the letter "a" would not be considered as a "repeated character," even though it appears twice, but "b" would be because it occurs twice with no intervening characters.


    emc

    Information about American English usage here and here.

    Any New York City or Connecticut area jobs? I'm currently unemployed.

Re: regexp to match repeated charater
by SFLEX (Chaplain) on Aug 28, 2007 at 13:44 UTC
    Here is one way to do it...
    #!/usr/bin/perl my $string = 'It was an intelligent stori'; my $matches = 0; my (@chr) = split(//, $string); foreach (@chr) { $matches++ if ($_ =~ m/[iI]/); # match it & count } print "Content-type: text/html\n\n"; print "<html>String: $string<hr>Matches: $matches</html>\n";
    Good Luck ^^