zangetsu has asked for the wisdom of the Perl Monks concerning the following question:

i'd like to match/get all non-word characters (using \W) except for some, say for example space / hyphen / what not.. is this possible as a one-liner? or do i really have to do list them explicityly?
  • Comment on Exclude some non-word char from Regexp \W

Replies are listed 'Best First'.
Re: Exclude some non-word char from Regexp \W
by kyle (Abbot) on Aug 25, 2008 at 19:37 UTC

    Do you mean this? /[^\w\s-]/

Re: Exclude some non-word char from Regexp \W
by jethro (Monsignor) on Aug 25, 2008 at 19:37 UTC
    /[^\w:;]/

    should get all non-word chars except for : and ; if I'm not mistaken

Re: Exclude some non-word char from Regexp \W
by bart (Canon) on Aug 25, 2008 at 19:42 UTC
    Apart from the negative character class, there's also negative lookahead:
    (?![;:])\W
    Remember that if you want to add quantifiers (like "*" and "+"), you will have to group this:
    (?:(![;:])\W)+

    I haven't benchmarked it, but I expect the negative charclass as the other people have shown you, is likely faster.

Re: Exclude some non-word char from Regexp \W
by moritz (Cardinal) on Aug 25, 2008 at 19:39 UTC
    kyle and jethro are right, and their answer is IMHO the best. But timtowtdi, so here's another way to do it:
    m/(?![ -])\W)/
Re: Exclude some non-word char from Regexp \W
by zangetsu (Initiate) on Aug 25, 2008 at 19:59 UTC
    umm.. i got it.. :) i posted a thank you to you guys too.. i don't know if it went through.. so i'm posting it again.. thanks a bunch guys.. you rule! :)
Re: Exclude some non-word char from Regexp \W
by gone2015 (Deacon) on Aug 25, 2008 at 19:45 UTC

    I think that "\W" is the same as "^\w", so \W less a few chosen characters would be ^\w -, wouldn't it ?

    Rats. I meant, of course: \W is the same as [^\w], so \W less space and hyphen would be [^\w -].

Re: Exclude some non-word char from Regexp \W
by JadeNB (Chaplain) on Aug 26, 2008 at 18:07 UTC
    I know the solutions given so far are great for excluding (or including, if one looks at it that way) 'a few more' things from a negated character class, but is there any general way to perform arithmetic of character classes? That is, is there any general way to get, say, the intersection of two character classes that's as natural as /[[:class1:][:class2:]]/ for union? (I realise that one can do (/(?=[[:class1:]])[[:class2:]]/, but that's not quite what I mean.)