in reply to Regexing my Search Terms

Your regex seems strange, specifically the \W? between the two occurences of .*?. That will change how the regex matches when the first quote is immediately followed by another quote, but I'm not sure why you'd want that.

Two other things to note... There's an assignment from $1 to $cat, but $1 is used instead of $cat in the rest of the loop. The compression of whitespace is redundant before a split that splits on runs of whitespace anyway.

With some alternation and $+ (for the last capturing parens that actually matched), this can be done with a single loop:

while ($keywords =~ /([-+]?)(?:'([^']*)'|"([^"]*)"|(\S+))/g) { my $cat = $1; my $keyword = $+; if ($keyword) { if ($cat eq '+') { push @KeysNeed, $keyword; } elsif ($cat eq '-') { push @KeysAvoid, $keyword; } else { push @Keys, $keyword; } } }

Replies are listed 'Best First'.
Re: Re: Regexing my Search Terms
by George_Sherston (Vicar) on Jan 07, 2002 at 22:32 UTC
    The .*?\W?.*? bit was meant to kick out any phrases that were a hundred percent non-word characters... which I now realise it would not actually do... I needed .*?\W+?.*?... but that wasn't the only thing wrong!

    § George Sherston