Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How does one search for words not in quotes? Take, for instance,

SELECT "SELECT * FROM table1 WHERE ..." SQL FROM table2 WHERE ...

The regex /(\bFROM\b.*?\bWHERE\b)/ will capture "FROM table1 WHERE" from within the quote and not the unquoted "FROM table2 WHERE".

Replies are listed 'Best First'.
Re: REGEX search for words not in quotes.
by Corion (Patriarch) on Apr 08, 2010 at 18:13 UTC

    Use a proper parser for that instead of regular expressions. Like SQL::Statement.

      Thank you -- long term this has to be the proper choice.
Re: REGEX search for words not in quotes.
by almut (Canon) on Apr 08, 2010 at 18:26 UTC

    Another variant, using Text::Balanced, which is also always useful for parsing stuff with quoted, parenthesized, etc. parts:

    #!/usr/bin/perl -l use strict; use warnings; use Text::Balanced qw(extract_delimited); my $text = 'SELECT "SELECT * FROM table1 WHERE ..." SQL FROM table2 WH +ERE ...'; my (undef, $remainder) = extract_delimited($text, '"', '[^"]*'); my ($extracted) = $remainder =~ /(\bFROM\b.*?\bWHERE\b)/; print $extracted; # FROM table2 WHERE
      Thank you for your suggestion.
Re: REGEX search for words not in quotes.
by rmcgowan (Sexton) on Apr 08, 2010 at 19:31 UTC

    Some good answers to your question have already been made, but I think there is some work to do on understanding the regular expression you propose.

    First, refer to the part '*?' that you used. The intent of the '?' modifier is to change the match from 'as many as possible' to 'as few as possible'. So, you get FROM table1 WHERE, the shortest match meeting the requirements.

    If you remove the question mark, you would get 'FROM table1 WHERE ..." SQL FROM table2 WHERE' instead. Note that this has nothing to do with the quotes, they are just another character in the string.

    So, if you want to do the shortest match and find all possible matches, you need to iterate through the string. You do that using the 'global' modifier in a scalar context:

    $str='SELECT "SELECT * FROM table1 WHERE ..." SQL FROM table2 WHERE'; print "$1\n" while $str =~ /(\bFROM\b.*?\bWHERE\b)/g;

    The result is two lines:

    FROM table1 WHERE FROM table2 WHERE

Re: REGEX search for words not in quotes.
by BrowserUk (Patriarch) on Apr 08, 2010 at 18:19 UTC

    $a = qq[SELECT "SELECT * FROM table1 WHERE ..." SQL FROM table2 WHERE +...];; $a =~ /(?:"[^"]+")?.*(\bFROM\b.*?\bWHERE\b)/g and print $1;; FROM table2 WHERE

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks! I worked up this mess
      #! use Smart::Comments; use strict; use warnings; my $np; $np=qr/ (?:\( # Capture the opening "(" (?: # '(?:[^']|\')*?' #' a single quote string |"(?:[^']|\")*?" #" a double quote string | [^()] # not a parentheses | (??{$np}) )* \)) # and the closing ")" /ix; my $string; my $re; $string=<<'__EOString__'; SELECT "SELECT * FROM table1 WHERE ...",'FROM WHERE' SQL FROM +table2 join(select from here to where eternity) WHERE ... __EOString__ $re=qr/ ^([^'"(]* (?:"(?:[^']|\")*?"|'(?:[^']|\')*?'|$np)? )* [^'"(]* (\bFROM\b ([^'"(]* (?:"(?:[^']|\")*?"|'(?:[^']|\')*?'|$np)? )* [^'"(]*? \bWHERE\b) /ix; # works ### $string while ($string =~ m{$re}ig) { ### @- ### @+ ### $1 ### $2 ### $3 }; exit;
      yielding ...
      ### $string: ' SELECT "SELECT * FROM table1 WHERE ..." "FROM WH +ERE"SQL FROM table2 join(select my mother from here to where eternity +) WHERE ...' Complex regular subexpression recursion limit (32766) exceeded at RE.p +l line 45. ### @-: '0', ### '64', ### '64', ### '127' ### @+: '132', ### '64', ### '132', ### '127' ### $1: '' ### $2: 'FROM table2 join(select my mother from here to where eternity +) WHERE' ### $3: ''

        Is that:

        SELECT "SELECT * FROM table1 WHERE ...",'FROM WHERE' SQL FROM table2 j +oin(select from here to where eternity) WHERE ...

        valid syntax anywhere?

        I'm betting the answer is no. So why the hell are you trying to cater for it?

        My house could be struck by a meteorite--it happens--but even the most risk adverse, paranoid people on the planet--insurance companies--do not require me to indemnify against it.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.