Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

"huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND biden AND clinton OR sebelius OR mccain"
I'm trying to take a string such as the one shown above, and to surround with parentheses all word groups that are connected by " OR ".
A word group is defined as all the characters between " AND " or " OR ".
For instance, in the string shown above, "jfkadlsfj fdsfldfj06329 OR reagan" and "clinton OR sebelius OR mccain" would both be esconced with an opening and a closing parentheses.
The code I have tried below has simply put the entire string in parentheses. Any ideas? Thanks in advance.
$string = "huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND biden A +ND clinton OR sebelius OR mccain"; $string =~ s/(^|.+ OR |.+ AND )(.* OR .*)*($| OR .+| AND .+)/$1($2)$3/ +g;

Replies are listed 'Best First'.
Re: Building a boolean search engine
by moritz (Cardinal) on Aug 12, 2009 at 22:02 UTC
    split and join to the rescue:
    my $string = "huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND bide +n AND clinton OR sebelius OR mccain"; my @chunks = split / AND /, $string; @chunks = map { /OR/ ? "($_)" : $_ } @chunks; print join(' AND ', @chunks), "\n"; __END__ huckabee AND (jfkadlsfj fdsfldfj06329 OR reagan) AND biden AND (clint +on OR sebelius OR mccain)
      >perl -wMstrict -le "print '------ output ------'; my $s = shift; print $s; my @chunks = split / AND /, $s; @chunks = map { /OR/ ? qq{($_)} : $_ } @chunks; print join ' AND ', @chunks; " "huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND OReilly AND clinto +n OR sebelius OR mccain" ------ output ------ huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND OReilly AND clinton + OR sebelius OR mccain huckabee AND (jfkadlsfj fdsfldfj06329 OR reagan) AND (OReilly) AND (cl +inton OR sebelius OR mccain)
      (Note the  '... AND (OReilly) AND ...' mis-conversion.)
      (Sorry for the line wrap!)

      A slight improvement that avoids confusion with embedded  'OR' substrings (also is tolerant of variable whitespace around the  'AND' connective, putting back just what it finds):

      >perl -wMstrict -le "print '------ output ------'; my $s = shift; print $s; my @chunks = split m{ (\s+ AND \s+) }xms, $s; @chunks = map { m{ \b OR \b }xms ? qq{($_)} : $_ } @chunks; print join '', @chunks; " "huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND OReilly AND cli +nton OR sebelius OR mccain" ------ output ------ huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND OReilly AND clin +ton OR sebelius OR mccain huckabee AND (jfkadlsfj fdsfldfj06329 OR reagan) AND OReilly AND (c +linton OR sebelius OR mccain)
      Excellent. Thank you!
Re: Building a boolean search engine
by AnomalousMonk (Archbishop) on Aug 13, 2009 at 01:39 UTC
    Rather than moritz's approach of Re: Building a boolean search engine, my preference would be for a more 'factored' regex. While more verbose, it is also, IMHO, more comprehensible, flexible and maintainable. YPMV.
    >perl -wMstrict -le "my $and = qr{ \b AND \b }xms; my $or = qr{ \b OR \b }xms; my $not_connective = qr{ (?! $and | $or) }xms; my $term = qr{ \b (?: $not_connective \w)+ \b }xms; my $terms = qr{ $term (?: \s+ $term)* }xms; my $or_terms = qr{ $terms (?: \s+ $or \s+ $terms)+ }xms; print '------ output ------'; my $s = shift; $s =~ s{ ($or_terms) }{($1)}xmsg; print $s; " "huckabee AND jfkadlsfj fdsfldfj06329 OR reagan AND OReilly AND cli +nton OR sebelius OR mccain AND biden OR ANDERSON COOPER" ------ output ------ huckabee AND (jfkadlsfj fdsfldfj06329 OR reagan) AND OReilly AND (c +linton OR sebelius OR mccain) AND (biden OR ANDERSON COOPER)