Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: A (non) reg-ex question

by hv (Prior)
on Mar 20, 2006 at 14:16 UTC ( [id://537950]=note: print w/replies, xml ) Need Help??


in reply to A (non) reg-ex question

Here's one approach:

!/[^01]/ && !/10/ && ($_ ^ reverse $_) !~ /[^\001]/

That is: a) check that there are no characters that are not 0 or 1, b) check that no 1s precede 0s, c) XOR the string with its reverse - it should give chr(1) for every character.

Another approach, this time destructive:

1 while s/(?<!1)01(?!0)//g; length($_) == 0;

That is: delete "01" from the centre as often as you can, and verify that this deletes all characters from the string. The lookarounds guard against deleting from 101 or 010.

And another, using a single recursive regexp:

our $re; $re = qr{(?:0(??{$re})1)?}; /^$re\z/;

That is: MATCH := "" | "0" MATCH "1".

Update: added missing '/' in s///, per johngg.

Hugo

Replies are listed 'Best First'.
Re^2: A (non) reg-ex question
by tilly (Archbishop) on Mar 21, 2006 at 04:36 UTC
    For amusement, here is the recursive regexp modified to match the case where there are as many 1's as 0's (in any order):
    my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )* }x; /^$re\z/;
    A random note. For amusement I tried to optimize this, but that failed. I believe the following should be fast, but it isn't, and that's probably a bug (in my understanding, or in the RE engine):
    my $re; $re = qr{ (?: 0 # 0 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 1 # 1 | 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )* }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
    Given the performance problems, I would not recommend using this regular expression in real life.

    UPDATE: It was a bug in my understanding. The following works and is fast:

    my $re; $re = qr{ (?: 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (?> # If I get here, do not backtrack! (??{$re}) # Balanced 1's and 0's ) 0 # 0 )*? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
    And so is this:
    my $re; $re = qr{ (?> # Avoid backtracking 0 # 0 (??{$re}) # Balanced 1's and 0's 1 # 1 | 1 # 1 (??{$re}) # Balanced 1's and 0's 0 # 0 )*? }x; for (qw(1001 11100 000111 00111), "101"x50) { if (/^$re\z/) { print "$_ matched\n"; } else { print "$_ did not match\n"; } }
    The key is that you need to avoid having to backtrack on success.
Re^2: A (non) reg-ex question
by johngg (Canon) on Mar 20, 2006 at 16:25 UTC
    I think you may have missed the last "/" in your global substitute.

    Cheers,

    JohnGG

Re^2: A (non) reg-ex question
by johngg (Canon) on Mar 20, 2006 at 21:24 UTC
    A supplementary reply, this time in the form of a question.

    I am trying to understand your third method that uses the recursive regular expression. What I think it is doing is finding balanced "01" pairs in the same way as the Camel book example finds balanced "()" pairs. What I want to know is, how does the regular expression know when to stop recursing? Is the "?" quantifier something to do with it? It seems to me that the regular expression is consuming characters from both ends of the string at the same time. Am I missing something obvious?

    Cheers,

    JohnGG

      I think my edition of the Camel predates qr{} so I doubt I have that example to hand, but I'd expect it to be much the same except that balanced parens can usually look like "()(())(()())" and the like.

      Except for certain special optimisations, a regular expression is always traversed from left to right. The critical thing to stop infinite recursion is to ensure that at least one character must be matched each time before recursing, ie that there be something to the left of the nested $re.

      In this case, ($re = qr{(?:0(??{$re})1)?}), we require a '0' to be matched before recursing. So "0011" =~ /^$re\z/ will be executed like:

      anchor to start try $re match '0' try $re match '0' try $re fail to find '0' group is optional, matched zero times $re matched '' match '1' group is optional, matched one time $re matched '01' match '1' group is optional, matched one time $re matched '0011' anchor to end match complete: matched '0011' from start

      Similarly, you could (foolishly) replace /0+/ with a recursive regexp like:

      $re = qr{0(??{$re})?};
      which would work, but the following would fall into infinite recursion:
      $re = qr{(??{$re})?0};

      Hope this helps,

      Hugo

        Thank you. That explains it very clearly. What a powerful feature. I think I'll go away and practice before I try to use this in anger.

        I have the Third Edition of the Camel book in which qr{} and (??{$re}) appear. I have to say that your explanation is clearer than the book.

        Cheers,

        JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://537950]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-04-19 10:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found