in reply to Re: regex question
in thread regex question

Thanks. the problem is that I really want to match *test* when it is on a word boundary. So, I want to match /*test*/ for example, or --*test-- That's why I thought \b\*test\*\b would do the trick. I thought that \b should stop matching if it finds the end of a word boundary or a character that is explicitly specified. So, since * is part of a word boundary, but is also the next exact match in the regex, I thought \b would stop matching. Guess \b sucks anything that matches, even if it is explicitly specified as something to be matched.

Replies are listed 'Best First'.
Re: Re: Re: regex question
by davido (Cardinal) on Sep 25, 2003 at 17:34 UTC
    Now you've lost me. You earlier said that you wanted your regexp to match the literal string '*test*'. And you provided a regexp with /\b\*test\*\b/, thus spelling out the absolute need for an asterisk to preceed and follow the word test, in order for the match to occur.

    But now you've said that you want to match both '*test*' and '--*test--'. What made you think that '--*test--' would match against a regexp that specifies '\*test\*'?

    Also, \b is a zero width assertion that specifies that there must be a word boundry at that particular position. A word boundry is the point where 'word characters' and 'non-word-characters' meet. There is no word character on either side of ' *test* ' at the position your original regexp place boundry assertions, and that's why your regexp fails. You went looking for a word boundry at the junction between a space character and an asterisk, in your original question. That's not a word boundry. A word boundry is, again, a "zero width assertion". '*' is not part of a word boundry. '*' is, if next to a word character, the non-word character that creates a word boundry in the zero-width space between the word character and the asterisk. But word boundries don't have a part; they don't consume a character. \b doesn't suck anything in.

    Perhaps what you are saying is that you want 'test' to match as long as it is surrounded by a word boundry. That's easy. However, the following example will also match at the beginning of the string even if nothing comes before it, because the beginning of the string can be a word boundry too:

    $string = '--*test--'; if ( $string =~ /\btest\b/ ) { print "$string matched.\n" }

    If you want to match both the word test, and the actual non-word characters, which themselves are required to be there, that preceed and follow it, that's also easy:

    my $string = "--*test--"; if ( $string =~ /\W+test\W+/ ) .....

    Here there's really no need for the \b, because a word boundry is implicit in the fact that you've said that one or more non-word characters must preceed and follow the word 'test'.

    I'm still a little foggy on what you're saying in your followup question; it redefines the problem to a degree, and actually has unresolvable conflicts within its own assertions.

    I really think that you would benefit by having a look at the appropriate perldocs: perlrequick, perlretut, perlre, and the FAQ on Regular Expressions, perlfaq6. If you have Perl, you have those documents. I know it looks like a lot of reading, but the time I've taken in trying to compose a consciencious answer to your question is about equal to the time it would take you to read a couple of those documents in their entirety yourself. You can appreciate my frustration when after putting together a thorough and complete answer yesterday, your followup question today changes everything, and is still ambiguous, conflicted, and vague. Why did I bother in the first place if you're not going to do a little homework yourself?

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein