in reply to Re: Regular Expression To Extract Multiple Matches Pattern
in thread Regular Expression To Extract Multiple Matches Pattern

I must be missing something - Why are we using the character set match of [a-z] in place of \w ? The use of \w would make the resulting code a lot more readable. Eg.

while ($teststring =~ /\b(\w+-\w+-\w+)\b/gi) { print "$1\n"; }

Also too, the boundary markers \b as suggested in the reply by Kanji have merit and I think warrant inclusion.

 

Update

As busunsl rightly points out, \w includes the underscore character in matching which has not been specified for inclusion ... [\w[^_]] anyone? :-)

 

perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

Replies are listed 'Best First'.
Re: Re: Re: Regular Expression To Extract Multiple Matches Pattern
by busunsl (Vicar) on Jan 07, 2002 at 16:20 UTC
    Perhaps because \w includes the underscore and that was not asked for.
Re: Re: Re: Regular Expression To Extract Multiple Matches Pattern
by blakem (Monsignor) on Jan 08, 2002 at 00:23 UTC
    [\w[^_]]
    Nested character classes aren't implemented yet... That will parse somthing like this:
    [ # start char class \w # any word char [ # or a literal '[' ^ # or a literal '^' _ # or an underscore (redundant...) ] # end char class ] # followed by a literal ']'
    If you want a character class consisting of all the word chars except underscore, you need to use the double negative (and somewhat non-intuitive):
    [^\W_]
    Which matches a character that is not a non word char (i.e a word char) and not an underscore.
    % perl -le '/[^\W_]/ && print for qw(a b _ c d)' a b c d

    -Blake