bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

Still rather new to the amazing world of RegExp and racking my brain on parsing a user-entered boolean search string:
#!/usr/bin/perl -wT use strict; my $searchstring = "aaaa OR bbbb OR cccc OR dddd"; $searchstring =~ s/(\w+) ?(OR|NOT|AND)/\(fieldname \= '$1'\) $2/g; print "SELECT * FROM tablename WHERE $searchstring\n";
I'm getting:
SELECT * FROM tablename WHERE (fieldname = 'aaaa') OR (fieldname = 'bb +bb') OR (fieldname = 'cccc') OR dddd
And I want:
SELECT * FROM tablename WHERE (fieldname = 'aaaa') OR (fieldname = 'bb +bb') OR (fieldname = 'cccc') OR (fieldname = 'dddd')
How do I grab that last "dddd"?

Thanks, fellow monks.

Replies are listed 'Best First'.
Re: Parsing a boolean search string for SQL query
by herveus (Prior) on Sep 01, 2003 at 17:05 UTC
    Howdy!

    The reason you miss the last term is because the regexp requires the OR/AND/NOT. Put a ? after that group and it picks up the last term (at the expense of an "use of uninitialized value in concatenation" warning).

    Alternately, you could split on whitespace and modify the words that aren't OR/AND/NOT...see my tested code below...

    #!/usr/bin/perl -w use strict; my $searchstring = "aaaa OR bbbb OR cccc OR dddd"; my @words = split(/\s+/, $searchstring); $searchstring =~ s/(\w+) ?(OR|NOT|AND)?/\(fieldname \= '$1'\) $2/g; foreach (@words) { next if /^(OR|AND|NOT)$/; $_ = "(fieldname = '$_')"; } my $searchstring2 = join(' ', @words); print "regex: $searchstring\nsplit/join:$searchstring2\n";

    yours,
    Michael

      This is not the first time where i have seen the split solution be more robust than the substitute solution. I just ran your code with the following string:
      my $searchstring = "aaaa OR bbbb OR NOT cccc AND dddd";
      
      And was happy to see the following results (formatted for the Monastery):
      regex:
         (fieldname = 'aaaa') OR 
         (fieldname = 'bbbb') OR 
         (fieldname = 'NOT')
         (fieldname = 'cccc') AND
         (fieldname = 'dddd')
      
      split/join:
         (fieldname = 'aaaa') OR 
         (fieldname = 'bbbb') OR NOT 
         (fieldname = 'cccc') AND 
         (fieldname = 'dddd')
      

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re: Parsing a boolean search string for SQL query
by gjb (Vicar) on Sep 01, 2003 at 17:05 UTC

    It would be much simpler to work the other way round, i.e. by replacing all query terms 'x' by 'fieldname = x'. This would mean something like:

    s/(\w+)/$1 ne 'OR' && $1 ne 'AND' && $1 ne 'NOT' ? "fieldname = '$1' +" : $1/eg;

    Hope this helps, -gjb-

      Brilliant gjb! The incredible power of reg exp's is again demonstrated. I had no idea you could use the implied if/then/else like that. Oh, I love the Monastery.
Re: Parsing a boolean search string for SQL query
by jeffa (Bishop) on Sep 01, 2003 at 17:11 UTC
    And just exactly how are you going to account for NOT? Remember, the syntax for NOT is something like:
    SELECT * FROM movie WHERE title LIKE 'A%' AND title NOT LIKE 'Ab%'
    And the fact that i chose to use LIKE instead of = says something as well ... your interface is going to get hairy with the approach you have chosen.

    Instead of parsing some user-given string, you should restrict their interface to lists and check boxes/radio buttons. Then you can create your SQL statement with something like SQL::Abstract.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      I love people who put LIKE '$user_query%' into their SQL, especially when they don't restrict the query charset and/or result set size properly. It makes data mining really easy. You just query for % and generally get a database dump.....

      Not suggesting that jeffa would for a moment, just a relevant note. Here is one example where you can easily extract the database. Here is another where the lack of server side checking lets you ask for the entire database (with a correctly crafted POST) and get it. For example this snippet of HTML will dump the ENTIRE braintrack database into your browser (the thank you for attending our source code is kinda funny in context).....

      <FORM action="http://www.braintrack.com/search.htm" METHOD=POST> <INPUT TYPE="hidden" NAME="term1" value="%"> <INPUT TYPE="hidden" NAME="term2" value=""> <INPUT TYPE="hidden" NAME="term3" value=""> <INPUT TYPE="hidden" NAME= "join" value="and"> <INPUT TYPE="hidden" NAME="top" value="7000"> <INPUT TYPE=SUBMIT VALUE="Search"> </FORM>

      I doubt that this is what was intended when they put the DB online. For the record I emailed braintrack.com about this issue some months ago but they have not done anything about it (or actually had the courtesy to reply, for that matter).

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Nope. You got me: http://dvd.unlocalhost.com/?list_m=%25&go_movie=By+Movie

        That should only show movies that start with some string ... but i did not check for the fact that the user could provide their own GET query and specify only '%' ... of course, for this application ... it is OK, but that still does not excuse the potential security hole i dug.

        For the record, i am rewiting that interface with Template and Class::DBI. It's slow going, because i have tables in the 3rd Normal Form, and Class::DBI currently does not handle selections across tables as well as i would like. But as soon as i am finished, i will share it with the Monastery (i have not written a large tutorial in a while ...).

        UPDATE: (for anyone interested in such)
        My DVD CGI search form uses POST requests. This (in the past) meant that web bot scripters needed to look at the source and munge out all the parameters in order to re-create the proper form query. I used the Web Developer Plugin for the Firebird browser to literally convert POST requests to GET requests on a loaded HTML page. Then, when you submit, the form parameters are sent as a GET request, allowing you to cut-n-paste the URL such as i did above. Of course, this only works if the CGI script you are querying accepts both GET and POST requests. Not all do. (*cough* *cough*)

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: Parsing a boolean search string for SQL query
by Abigail-II (Bishop) on Sep 01, 2003 at 21:38 UTC
    You're almost there:
    s/(\w+) ?(OR|NOT|AND|$)/\(fieldname \= '$1'\) $2/g;

    should do it. But that's just for this particular example. If you want to be able to process something more complicated, I'd recommend using a parser. For instance, Parse::Recdescent, or a regular expression using (??{ }) constructs.

    Abigail

Re: Parsing a boolean search string for SQL query
by zby (Vicar) on Sep 01, 2003 at 16:36 UTC
    Here is my solution:
    #!/usr/bin/perl -w use strict; my $searchstring = "aaaa OR bbbb OR cccc OR dddd"; $searchstring =~ s/(\w+) ?(OR|NOT|AND)|(\w+)$/\(fieldname \= '$1$3'\) +$2/g; print "SELECT * FROM tablename WHERE $searchstring\n"; __OUTPUT__ Use of uninitialized value in concatenation (.) or string at a.pl line + 7. Use of uninitialized value in concatenation (.) or string at a.pl line + 7. Use of uninitialized value in concatenation (.) or string at a.pl line + 7. Use of uninitialized value in concatenation (.) or string at a.pl line + 7. Use of uninitialized value in concatenation (.) or string at a.pl line + 7. SELECT * FROM tablename WHERE (fieldname = 'aaaa') OR (fieldname = 'bb +bb') OR (fieldname = 'cccc') OR (fieldname = 'dddd')
    But it is not perfect as you can see.
Re: Parsing a boolean search string for SQL query
by dmitri (Priest) on Sep 01, 2003 at 20:12 UTC
    Implementing my own parser for simple SQL-like query language was easy with Parse::RecDescent. Then you can build a nested structure with callbacks so you can even test the statement with different values.
Re: Parsing a boolean search string for SQL query
by Anonymous Monk on Sep 01, 2003 at 18:47 UTC