rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

hi monks,
i'm trying to write something that will pass a string as such:
"( foo ) AND ( bar OR boo ) NOT ( far )"
into a hash which i can then use to run some "tests".
so far this is the code i have:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $term = "( foo ) AND ( bar OR boo ) NOT ( far )"; # This regex isn't right my (@matches) = ( $term =~ /( )?(AND|OR|NOT)?( )?(\((.*)\))?/g ); # Testing print $_,"\n" foreach (@matches); # My plan was pull out all the parens and their contents # + the preceeding condition # then modify those result to put into a hash in the same # structure as below. # This is what the results should come out as my %tests = (); $tests{1} = { condition => "", terms => [ "foo" ] }; $tests{2} = { condition => "AND", terms => [ "bar", "OR foo" ] }; $tests{3} = { condition => "NOT", terms => [ "far" ] }; print Dumper(\%tests); exit;
before i went any further i just wanted to see if anyone could suggest a better way of dealing with it?
if not, could anyone help me out with the regex?
cheers,
reagen

Replies are listed 'Best First'.
Re: Logical expressions
by Zaxo (Archbishop) on Jun 05, 2007 at 03:59 UTC

    Would a little boolean logic parser help? You could construct a Parse::RecDescent parser from standard BNL with actions to populate the hash.

    What is the benefit of poking a hash full of this? Is it a parse tree to be executed, like perl constructs in its compile time? In that case it might be convenient to retain order by using arrays instead of hashes. Your (1,2,3) keys suggest you want to order them.

    After Compline,
    Zaxo

      my thinking was to break the string down to run a series of tests on the articles it is examining.
      the idea behind the hash and ordering is i can work out which tests to run first and if the important ones fail first (AND|NOT), there is no need to continue running the rest of the tests.
Re: Logical expressions
by GrandFather (Saint) on Jun 05, 2007 at 04:10 UTC

    Ultimately you are writing a parser and regexen are only part of the answer. The heavy weight answer is Parse::RecDescent. A (perhaps) slightly easier path for a very simple grammar is to manually write a recursive decent parser. However the easiest way to do that is first write a formal grammar, then write a sub for each element of the grammar. Since the first part is essentially the input to Parse::RecDescent and the second part is what the module does for you, you don't gain a lot by rolling your own. ;)


    DWIM is Perl's answer to Gödel
Re: Logical expressions
by jZed (Prior) on Jun 05, 2007 at 05:01 UTC
    Here's a bass-ackwards way to do it ... use SQL::Statement which supports logical statements of the kind you show. You can either use it directly and come out with a nested structure of conditions, or you can use one of the DBDs it supports (e.g. DBD::CSV ) which would allow you to treat a single string or single array or an array of arrays or an array of hashes as a temporary table and run queries against it. Alternatively you could rip out the code that does the parsing/evaluation and modify it to suit your needs.
Re: Logical expressions
by ysth (Canon) on Jun 05, 2007 at 08:14 UTC
    What does a binary NOT operator mean? I would have expected AND or OR before the NOT in your example.

      Looks like a search query to me, so NOT would mean "and doesn't contain" (like the - operator in Google queries). Also, if queries meaning "documents not containing term" are unsupported (again like Google), then there's no need for a unary negation operator.

      sorry for the late reply.
      yep, the NOT defines any terms, that the article it is searching, doesnt contain said terms.
Re: Logical expressions
by ikegami (Patriarch) on Jun 05, 2007 at 14:58 UTC

    Here's the work I've done so far. I can't test this at the moment (since I can't get Parse::RecDescent at this time), but I intend to come back to this.Tested. I'll even post a version that doesn't use Parse::RecDescent after testing this version.

    My code should output a structure of the form

    # ( a OR b ) AND ( c ) NOT ( d ) AND ( e OR f ) [ [ 'a', 'b' ], [ AND => 'c' ], [ NOT => 'd' ], [ AND => 'e', 'f' ], ]

    Parser creator:

    # make_parser.pl use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : expr /^\Z/ { $item[1] } expr : terms expr_ { [ $item[1], $item[2] ] } expr_ : AND terms expr_ { [ [ $item[1], @{$item[2]} ], @{$item[3]} + ] } | NOT terms expr_ { [ [ $item[1], @{$item[2]} ], @{$item[3]} + ] } | { [] } terms : '(' terms_ ')' { $item[2] } terms_ : IDENT OR terms_ { [ $item[1], @{$item[3]} ] } | IDENT { [ $item[1] ] } IDENT : /\S+/ AND : IDENT { $item[1] eq 'AND' ?1:undef } { $item[1] } NOT : IDENT { $item[1] eq 'NOT' ?1:undef } { $item[1] } OR : IDENT { $item[1] eq 'OR' ?1:undef } { $item[1] } __END_OF_GRAMMAR__ Parse::RecDescent->Precompile($grammar, "QueryParser") or die("Bad grammar\n");

    Sample application:

    # query_parser.pl use strict; use warnings; use Data::Dumper qw( ); use QueryParser qw( ); sub display { my ($expr, $tree) = @_; local $Data::Dumper::Indent = 0; print(Data::Dumper->Dump([$tree], [$expr]), "\n"); } { my $parser = QueryParser->new(); foreach my $expr ( '( keyword1 )', '( keyword1 ) AND ( keyword2 )', '( keyword1 ) AND ( keyword2 ) AND ( keyword3 )', '( keyword1 ) NOT ( keyword2 )', '( keyword1 ) NOT ( keyword2 ) NOT ( keyword3 )', '( keyword1 ) AND ( keyword2 ) NOT ( keyword3 )', '( keyword1 ) NOT ( keyword2 ) AND ( keyword3 )', '( keyword1 OR keyword2 )', '( keyword1 ) AND ( keyword2 OR keyword3 )', '( keyword1 ) NOT ( keyword2 OR keyword3 )', ) { my $tree = $parser->parse($expr); display($expr, $tree); } }

    It could use some better error reporting. I'll probably only do that in the version that doesn't use Parse::RecDescent (since PRD is so slow).

    It could be optimized a bit.

    I'm kinda curious about the parens. They are superfluous.

    Update: Tested. I made a couple of small fixes.

Re: Logical expressions
by jbert (Priest) on Jun 05, 2007 at 09:51 UTC
    As well as writing a parser, you might be writing a language - and that can be hard and/or annoying.

    Can I ask if you are also determining the format of the string, i.e. the logical language itself?

    Also, may I ask how are you going to apply the hash you've built up?

    It may be that there is an already existing sub-language which meets your needs.

      It may be that there is an already existing sub-language which meets your needs.

      sounds to me like there is an already existing full language which meets your needs: Prolog.

      yes it is a real language... we developed a fully general symbolic agenda-driven Earley parser in Prolog winter quarter :D

      Here is a perl-based interpreter (which I haven't used) and here is my favorite prolog interpreter.

      i am creating the string out of a set of terms which will not change. basically just adding them in with a bunch of AND|OR|NOT and parens.

      The idea is some articles contain a list of keywords and i'm trying to run the search string over them to find which ones match. it's not too hard until i got to recursive parens, which was where i decided to break it down to a hash of "rules" which are run individually (kinda from the inside out) making sure each one "passes" on the way.
        One problem you can find with 'little languages' on their own is that over time you get things like "it would be nice if you could use variables, or loops, or...".

        I've seen it a few times with web templating systems, test case specification languages, etc.

        If you can possibly make use of an existing 'full' language, then that may be better. One option is - perl (you have it to hand and know it well...)

        You can put together strings like "$foo and ($bar or $boo)". Instead of writing a language parser and evaluator, you write the code which sets up the $foo, $bar variables and then calls "eval $logical_expression".

        Such an approach would require less code (you'll have to set up the same foo/bar/boo whichever approach you take) and you also potentially have the full power of perl there if you ever need it. You're basically using 'eval' as your parser and evaluator.

        Some more of the rationale behind this approach is suggested in the documentation to Text::Template. You could even use that module as a way of helping set up your foo/bar/baz, since your logical expression could be seen simply as a text template evaluating to 1 or 0. :-)

        People sometimes talk about using Domain Specific Languages (DSLs) - and they *are* often a good idea. One point which sometimes gets missed is that DSLs are *embedded* in the full language they are created in, not just *implemented*. You can still escape to the full language and so don't have to re-invent precedence rules, loops, conditionals, variables and all that good stuff.