Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: regex at word boundary

by QM (Parson)
on Dec 08, 2005 at 04:50 UTC ( [id://515127]=note: print w/replies, xml ) Need Help??


in reply to Re^2: regex at word boundary
in thread regex at word boundary

Very good.

'1\b2'
I would point out the existence of (?!), which I'm told always fails, though I'm not sure I understand it.

Update 1:

By the way, in my solution I chose to accept palindromes regardless of whether they contain only alpha characters or not.
Yes. The OP was looking for multi-word palindromes, perhaps more along the lines of the "interesting to humans" variety, which seems to be what started the thread in the first place.

Update 2: After further examination, your idea could be adapted for intervening whitespace (or indeed any noise characters) if the regex engine was re-entrant (is that with or without the hyphen?). Something like:

(??{ local $N; ($N = $^N) =~ s/\w+//g; (lc $N eq reverse lc $N) ? '' : (?!) })
which might be further streamlined to
(??{ local $N; ($N = $^N) =~ s/\w+//g; (?!) if (lc $N ne reverse lc $N) })
I think length $^N > 1 is superfluous, as length $^N is sufficient as a test, and (.+) would always be positive anyway (or is there a zero length character that would match?)

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re^4: regex at word boundary
by davido (Cardinal) on Dec 08, 2005 at 05:03 UTC

    (?!) explained:

    From perlre: "A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar"." So in other words, (?!) is a negative lookahead assertion that "must not match nothing" (an impossibility).

    YAPE::Regex::Explain describes it simply like this:

    perl -MYAPE::Regex::Explain -e "printYAPE::Regex::Explain->new( qr/(?! +)/ )->explain();" The regular expression: (?-imsx:(?!)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?!) fail ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Dave

      "must not match nothing"
      I understand that at some level, but my brain is yelling "double negative!!". It might be better to say:
      (?!) is the negative look-ahead assertion for the null string. Since the null string matches anything and everything (including nothing), (?!) always fails.
      [I was tempted to find a symbolic representation for the null string, but I thougt that would only confuse the issue.]

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^4: regex at word boundary
by davido (Cardinal) on Dec 08, 2005 at 05:10 UTC

    Regarding the length $^N > 1 test, I added that because while refining this RE I discovered that it was capable of considering a single-character string to be palindromic (new word?), since "a" in reverse is still "a".

    Update: Now that I think about it, the length $^N > 1 test is better written as ..+ earlier in the RE, like this:

    qr/(..+) (??{ ( lc $^N eq reverse lc $^N ) ? '' : '(?!)' }) /ix

    ...thus forcing at least two characters to be tested in the first place, rather than hand-coding a test for two or more characters via 'length'. Duh!


    Dave

      It must be the late hour, because I missed that (.+) would give trivial palindromes of length 1.

      I prefer (.{2,}) over (..+), as those dots tend to fade into the background.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re^4: regex at word boundary
by davido (Cardinal) on Dec 08, 2005 at 05:57 UTC

    As you point out, the regexp engine is not re-entrant. But tr/// isn't a regular expression, and that can be used to strip away whitespace.

    use strict; use warnings; use vars qw/ $N /; my $string = "God dog"; my $re = qr/^ (..+) (??{ local $N = lc $^N; $N =~ tr!\t \f!!d; ( $N eq reverse $N ) ? '' : '(?!)' }) $/ix; print "$string is", ( $string =~ $re ) ? "" : "n't" , " a palindrome\n";

    Dave

      Does this find all non-trivial palindrome phrases on each input line?

      Does it fail on some interesting input?

      #!/your/perl/here # find all palindrome phrases on each line # only phrases of two or more alpha # a phrase starts and ends on a word boundary # # nested palindrome phrases will also be found # (including single words) use strict; use warnings; our $N; my $re = qr/( # start $1 \b # left word edge ([a-z].*[a-z]) # at least 2 alpha \b # right word edge (??{ # start code local $N = lc $^N; # save capture $N =~ tr!a-zA-Z!!dc; # remove non-alpha # fail if not pal '(?!)' if (lc($N) ne reverse lc($N)); }) # end code ) # end $1 /ix; while (<DATA>) { my $found; while ( /$re/g ) { print "line $.:\n" unless $found; pos = pos() - length($1); # find nested pals print "(",pos,") \"$1\"\n"; $found = 1; # \n between groups pos = pos() + 1; # advance one } print "\n" if $found; } exit; __DATA__ god dog alpha beta gamma stop pots wonka wonka wonka bookkeeper raisinhead what is a bookkeeper pop repeekkoob tomorrow? boob tube A man, a plan, a canal, Panama! kook peep aha aba abba abbba aabbbaa abbba abba aba
      produces
      __OUTPUT__ line 1: (0) "god dog" line 2: (17) "stop pots" line 4: (10) "bookkeeper pop repeekkoob" (21) "pop" line 5: (0) "boob" (10) "A man, a plan, a canal, Panama" (42) "kook" (47) "peep" (52) "aha" line 6: (0) "aba abba abbba aabbbaa abbba abba aba" (4) "abba abbba aabbbaa abbba abba" (10) "abbba aabbbaa abbba" (17) "aabbbaa" (25) "abbba" (31) "abba" (36) "aba"
      BTW, if '(?!)' is changed to (?!) (no single quotes), I get a "panic: top_env" from the compiler. Is this a known bug?

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

        From perldiag:
        Panic: top_env
        The compiler attempted to do a goto, or something wierd like that.

        "Panic" messages fall into that category of errors that "you should never see1"

        I've seen that one before, and it seems to me I was mucking around with (??{code}) constructs at the time. They don't call them "highly experimental" for nothing. ;)


        Dave

        This breaks on overlapping palindromes:

        adding these two lines to DATA:
        nested testest detsen nested
        i prefer pi ip referp
        
        produces this output:
        line 7:
        (0) "nested testest detsen nested"
        (7) "testest detsen nested"
        (15) "detsen nested"
        (22) "nested"
        
        line 8:
        (0) "i prefer pi ip referp"
        (2) "prefer pi ip referp"
        (9) "pi ip referp"
        (12) "ip referp"
        (15) "referp"
        

        Be Appropriate && Follow Your Curiosity

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://515127]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-24 02:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found