Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: regex at word boundary

by QM (Parson)
on Dec 07, 2005 at 19:49 UTC ( [id://515012]=note: print w/replies, xml ) Need Help??


in reply to regex at word boundary

As others have mentioned, you aren't going to find palindromes with that. Single word palindromes are problematic without some code magic. You really need regex recursion, which is what Regexp::Common::lingua does.

There are ways to do it with embedded regex ??{code}, without being as fancy as in R::C::l, but I'd go with R::C::l. If you really want to do it yourself, it's easy enough to Google for these, so I'll leave that up to you.

As far as I know, there's not an easy way to skip over whitespace, without explicitly stripping it out first (but maybe I'm just not being creative enough today).

-QM
--
Quantum Mechanics: The dreams stuff is made of

2005-12-12 Retitled by jdporter: s/boundry/boundary/
Original title: 'regex at word boundry'

Replies are listed 'Best First'.
Re^2: regex at word boundary
by davido (Cardinal) on Dec 08, 2005 at 04:27 UTC

    You're right about the (??{code}) construct facilitating the use of regular expressions in detecting palindromes. Behold:

    use strict; use warnings; my $string = "God dog"; my $re = qr/^ (.+) (??{ ( length $^N > 1 and lc $^N eq reverse lc $^N ) ? '' : '1\b2' }) $/ix; print "$string is", ($string =~ $re) ? "" : "n't", " a palindrome.\n";

    One trick here; since (??{code}) must spawn a regular subexpression that will then be evaluated for truth, the code block I used plays a little trick. If the boolean test of reversability succeeds, the (??{code}) expression returns an empty string (which, in other words, adds no additional criteria to be matched). If the reversability test fails, the code returns a regular subexpression that can never succeed; a search for '1\b2' (in other words, the literal number one, followed by the literal number two, but with a word boundry sandwiched between; an impossibility). That way the outcome of the reversability test can force a failure to match for the entire regular expression.

    Also, the use of $^N is a convenience described in perlvar and perlre. It contains the most recent parenthetical capture. That way you don't need to count capturing parens, not that it's an issue in this case.

    By the way, in my solution I chose to accept palindromes regardless of whether they contain only alpha characters or not. Why? Wikipedia says, "A palindrome is a word, phrase, number or any other sequence of units (like a strand of DNA) which has the property of reading the same in either direction..." It also says that the position of spaces can usually be adjusted as necessary. My regexp doesn't accomodate that possibility; it treats space like any other character. But why not; it's just a proof of concept. ;)


    Dave

      Very good.

      '1\b2'
      I would point out the existence of (?!), which I'm told always fails, though I'm not sure I understand it.

      Update 1:

      By the way, in my solution I chose to accept palindromes regardless of whether they contain only alpha characters or not.
      Yes. The OP was looking for multi-word palindromes, perhaps more along the lines of the "interesting to humans" variety, which seems to be what started the thread in the first place.

      Update 2: After further examination, your idea could be adapted for intervening whitespace (or indeed any noise characters) if the regex engine was re-entrant (is that with or without the hyphen?). Something like:

      (??{ local $N; ($N = $^N) =~ s/\w+//g; (lc $N eq reverse lc $N) ? '' : (?!) })
      which might be further streamlined to
      (??{ local $N; ($N = $^N) =~ s/\w+//g; (?!) if (lc $N ne reverse lc $N) })
      I think length $^N > 1 is superfluous, as length $^N is sufficient as a test, and (.+) would always be positive anyway (or is there a zero length character that would match?)

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        (?!) explained:

        From perlre: "A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar"." So in other words, (?!) is a negative lookahead assertion that "must not match nothing" (an impossibility).

        YAPE::Regex::Explain describes it simply like this:

        perl -MYAPE::Regex::Explain -e "printYAPE::Regex::Explain->new( qr/(?! +)/ )->explain();" The regular expression: (?-imsx:(?!)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?!) fail ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

        Dave

        As you point out, the regexp engine is not re-entrant. But tr/// isn't a regular expression, and that can be used to strip away whitespace.

        use strict; use warnings; use vars qw/ $N /; my $string = "God dog"; my $re = qr/^ (..+) (??{ local $N = lc $^N; $N =~ tr!\t \f!!d; ( $N eq reverse $N ) ? '' : '(?!)' }) $/ix; print "$string is", ( $string =~ $re ) ? "" : "n't" , " a palindrome\n";

        Dave

        Regarding the length $^N > 1 test, I added that because while refining this RE I discovered that it was capable of considering a single-character string to be palindromic (new word?), since "a" in reverse is still "a".

        Update: Now that I think about it, the length $^N > 1 test is better written as ..+ earlier in the RE, like this:

        qr/(..+) (??{ ( lc $^N eq reverse lc $^N ) ? '' : '(?!)' }) /ix

        ...thus forcing at least two characters to be tested in the first place, rather than hand-coding a test for two or more characters via 'length'. Duh!


        Dave

Palindrome regex (was: regex at word boundary)
by Roy Johnson (Monsignor) on Dec 12, 2005 at 17:10 UTC
    I came up with this recursive-regex code. It's a bit simpler than most of the others posted here. It finds the leftmost longest palindrome in each line.
    my $palin_re; $palin_re = qr/(([a-z]) # First letter [^a-z]* # any irrelevant chars (?:(??{$palin_re})|[a-z]?) # The interior [^a-z]* # any other irrelevant chars \2) # the last letter matches the first /xi; while (<DATA>) { if (/\b$palin_re\b/) { print "Matched $1\n"; } else { print "No palindrome found in '$_'\n"; } } __DATA__ oo madam in eden, I'm Adam, prince of eternia is god a dog? nested testest detsen nested i prefer pi ip referp nothing to see here, folks.

    Caution: Contents may have been coded under pressure.
Re^2: regex at word boundary
by mikeraz (Friar) on Dec 07, 2005 at 20:32 UTC

    You're right, the (corrected) code won't find palindromes. It wasn't intended to. It was the first step in finding a palindrome candidate. After the candidate is found I can strip out spaces and check to see if the word group does qualify as a palindrome.

    Ultimately I don't care about finding palindromes. It is just a little puzzle to apply some time to. Some people do soduko puzzles, some crosswords, I look for little problems that I don't know the answer to and explore the possibilities.

    Be Appropriate && Follow Your Curiosity
      You're right, the (corrected) code won't find palindromes. It wasn't intended to. It was the first step in finding a palindrome candidate.
      OK, fair enough. So what did you want it to do? Did you just want to check for repeated characters at the proper end of word boundaries?

      [Insert "Lightbulb Over Head" graphic here ]

      You just want a filter to grab candidates, then strip whitespace, then test for palindrome-ness.

      I think you'd do us a favor by coming up with an appropriate regex with ??{code}, something near:

      /(\b (([a-z]) # at least one char ?{local @suffix; push @suffix, $^N}) # build up suffix (([a-z) ?{local @suffix; push @suffix, $^N} | \S+)+ [a-z]? # possible odd char (([a-z]) (?{if ($^N eq $suffix[-1]) {pop @suffix}}) | \S+)+) (([a-z]) (?{if ($^N eq $suffix[-1]) {pop @suffix}}) \b) /x
      Though I don't think that works, and it's untested. (Hey, I have to leave something up to you.) Besides, there's a lot left out of the perlre with ?{code} and ??{code}, so please come back and tell us what it should say :)

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://515012]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-04-19 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found