Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Regex refresher

by Nemp (Pilgrim)
on Sep 12, 2002 at 11:38 UTC ( [id://197193] : note . print w/replies, xml ) Need Help??

in reply to Regex refresher


Cool thread :) ... but can somebody help me out please? I think my understanding/knowledge of regex's is sorely lacking because I'm not able to follow the descriptions in the answers above! To explain my problem...

First of all, the original answers I came up with on paper before looking at anyone else's replies were...
"", 01, 11, 0001, 1001, 00101, 10101, 000001, 100001, 001101
... which I'm assuming are the incorrect 10 from looking at the posts above and the amount of experience the posters share compared to me! :)

I wrote a little program which spits out the first answers in agreement with abell's answer above, but not in agreement with dws's.

I think my stumbling block is understanding how the regex /(0|1(01*0)*1)*/ matches the 00? Could somebody explain this to me? I've read a lot of the references but in my mind I'm looking at it like this...

<Incorrect, Please Explain Why?>
1) I take the 0|1 and figure that (except for the empty string) it will match any string that begins 0 or 1.
2) I take the (01*0)* and figure that it will match nothing, 00 or 0x0 where x is an arbitrary number of 1's.
3) I assume the last 1 means there has to be a 1 at the end of the regex?! I know this is wrong from the answers I'm getting back but I don't know why! argh!
</Incorrect, Please Explain Why?>

Scratching his head and hoping for help,

UPDATE: I think zakb has hit the nail on the head for me with his answer :) ... Wow! Do I feel silly now ;)
I *think* if you add the brackets that I was interpreting around the 0|1 combination it makes the answers I came up with correct :) Of course it doesn't help you spot the pattern that no longer exists ;)

Replies are listed 'Best First'.
Re: Re: Regex refresher
by zakb (Pilgrim) on Sep 12, 2002 at 12:05 UTC

    Unless I'm hopelessly wrong, for starters I think you have step one wrong.

    I think step one should be something like match either 0 or 1(01*0)*1. The final asterisk matches zero or more of one of the matched branch or atom so that's how it matches 00.

    If this was your stumbling block, hopefully I've helped!

      I think I'm missing something as well. My logic follows yours, and I got *almost* the same answers (in my head, anyway) as dws'. Ours differ firstly because he repeated 1111 ;), but more importantly on the number 101. I can't see that number matching, as 1(01*0)*1 as far as I can tell would match 1, then choosing to continue into the middle group, it would at the very least match 00, not? The only quantifier is on the 1, so as I see it, 00 needs to be matched, then the final 1, so that choosing to include the middle group would yield at minimum 1001. Thusly, I came up with:
      "" 0 00 11 000 0000 1001 1111 00000 10101
      Yet, I know that's not right (in hindsight). After looking over some other answers, I tried out 0011, 1100 and 0110, which all work out in practice, yet I can't figure out why. Is there some precedence issue that I'm missing?
      Eh..I've got some time to figure it out though, I'm only a 3rd year undergrad ;)

        See FoxtrotUniform's answer above. 101 is not matched.

        As for your other problems, here's a solution to one of them which should help with the others:

        For reference, the original regex was:


        Let's take your first example: 0011. 0 matches the first branch, and the final asterisk right at the end makes that greedy, so it also matches the second 0. 1 matches the second branch, the (01*0) matches nothing - which is ok because of the asterisk right after it. This effectively reduces the second branch to 1*1, which matches 11.

        The others (1100, 0110) are variations on this theme.