Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i'm trying to resolve a regex problem. I'm trying to use a backreference but don't know how. My documentation says I should use \1(or \l?)for that. Information on the internet says it should be g1,g2 etc. But neither seems to work. I have an example. I developed this regex to print all 3-letter words:

$tekst = "hoii hoi and oi"; while($tekst=~/\b(\w\w\w)\b/g){ print "$1\n"; }

I want to change it into all 3-letter words that occur more than once. So something like:

/\b(\w\w\w)\b\g1/g)
Maybe someone can tell me the proper way to do this. Thank you in advance.

Replies are listed 'Best First'.
Re: regex backreference
by Corion (Patriarch) on Nov 30, 2013 at 12:37 UTC

    Your regex is syntactically sound, at least if by "information on the internet", you mean the documentation that comes with Perl for regexes, like perlre.

    But there seems to be a logic error in your regular expression. You say "all 3-letter words that occur more than once". Maybe you can give us some examples what should match and what shouldn't.

    Your regular expression, as is, never matches "words", because there is no "word" (\w) that can have a "word boundary" (\b) in its middle. Maybe you want something else to appear between the two "words"?

    Consider the following data and tell us which "words" on the lines should match, and which ones shouldn't.

    foofoo foo foo foo bar foo foo bar foo bar foooof foo oof foo and bar foo foo and and bar bar foo and bar foo and bar
Re: regex backreference
by oiskuu (Hermit) on Nov 30, 2013 at 16:44 UTC
    All three-letter words in a line that repeat? Test if the following will do:
    my @words = $txt =~ m/\b(\w{3})\b(?=.*\b\1\b)(?!.*\b\1\b.*\b\1\b)/g;

    The (?!...) is a negative look-ahead assertion. If you leave that out, duplicates appear for words occurring more that twice. And (?=...) is the positive look-ahead assertion.

Re: regex backreference
by Kenosis (Priest) on Nov 30, 2013 at 16:49 UTC

    Perhaps the following will be helpful:

    use strict; use warnings; my %seen; while (<DATA>) { !$seen{ $. . $1 }++ and print "Line $.: $1\n" while /(\b\w{3}\b)(?=.*\1)/g; } __DATA__ foofoo foo foo foo bar foo foo bar foo bar foooof foo oof foo and bar foo foo and and bar bar foo and bar foo and bar and or and or say or say or say or say

    Output:

    Line 2: foo Line 3: foo Line 4: foo Line 4: bar Line 8: foo Line 8: and Line 8: bar Line 9: foo Line 9: and Line 9: bar Line 10: and Line 10: say
Re: regex backreference
by Anonymous Monk on Nov 30, 2013 at 17:35 UTC

    Thank you very much.

Re: regex backreference
by Anonymous Monk on Dec 01, 2013 at 11:57 UTC

    sorry to trouble you once more, i have changed my regex into:

    $tekst=~/(\b\w{3}\b)(?=.*\b\1\b)/g

    And it appears to be working. Can someone explain the look ahead assertion.The part:

     ?=.*\b\1\b

    It seems to me this ought to be ?=\1.

    I dont understand it like this. Thank you.

      Maybe trying it against some data helps you understand why word boundaries are necessary.

      foo foo kungfoo foo foobar foo fool
Re: regex backreference
by Anonymous Monk on Dec 01, 2013 at 12:38 UTC

    Nevermind, i figured it out. Thanks for responding.