in reply to Why do zero width assertions care about lookahead/behind?

Your description of \b is flawed.

\b matches only if each side is different from the other (or the trivial case where there is only one side -- but note that it won't match the empty string). It's already looking on both sides!

On the other hand, if you only want \W\b\w, but not \w\b\W, you'll have to do that yourself. [I suppose someone could come up with the backslash equivalents for these.]

In a complementary vein, if you want (?=...) to look both ways, how do you ask for only lookahead, or only lookbehind?

-QM

Replies are listed 'Best First'.
Re: Re: Why do zero width assertions care about lookahead/behind?
by davido (Cardinal) on Oct 08, 2003 at 21:51 UTC
    Yes, I noticed the error in my description of \b after having posted, and despite recent discussions, the ability for laypeople to edit parent nodes still doesn't exist. It would have been more accurate for me to say that \b doesn't care about whether it's being used at the beginning of a word or at the end of a word. And my point was, why should (?=...) care whether it's being used at the beginning or the end of the text it anchors its assertion to.

    You are accurate that \b looks at both sides. I think that merlyn provided the perfect clarification; \b has alternation built into it. It either looks like (?<=\W)(?=\w) or like (?<=\w)(?=\W) depending on whether it's being used at the beginning or the end of a word.

    So \b is not a simple lookahead or lookbehind assertion, it is a complex lookbehind/ahead in alternation with an opposing lookbehind/ahead.

    Similar assertions could be custom written too. Say for example I wanted to create \x (my new metacharacter that means boundry between space and nonspace). Well, I can't name it \x, but I suppose I could name it $x. But whatever I call it, the definition would be: (?:(?<=\s)(?=\S))|(?:(?<=\S)(?=\s))

    As far as what direction (?=...) looks, I didn't really want it to look both ways at once. I just thought it a little confusing that it could only look ahead. Without the prior benefit of merlyn's reply, thus not fully understanding the \b example, it seemed odd that (?=...) should be incapable of being used for lookbehind just as easily as lookahead. I understand that distinction now.

    As for my other comment regarding the fact that (?<=...) must be fixed-width, I understand that as liz stated, it would be a backtracking nightmare if such were not the case, but still don't fully understand why that is so. I'll have to re-read the section in the Owls book about DFA engines and backtracking. Eventually it will sink in. ;)


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein