in reply to Re: Break string into array
in thread Break string into array

Here is how that might look with Regexp::Grammars
#!/usr/bin/perl -- use strict; use warnings; my $s = q[dogs OR cats OR "flying fish" OR (shrimp AND squid)]; my $parser = do { use Regexp::Grammars; qr{ # <logfile: - > <[TERM]>* <rule: TERM> <OP> | <MATCH=IDENT> | <MATCH=STRING> | <LIST> <rule: STRING> "([^"]+?)" <rule: OP> AND|OR <rule: IDENT> \w+ <rule: LIST> \( <[TERM]>* \) }xs }; if($s =~ $parser){ my(%rash) = %/;#bah for scite lexer /# undef %/;# bah for scite lexer /# use Data::Dumper(); print Data::Dumper->new([\%rash])->Indent(1)->Useqq(1)->Dump,"\n"; kek(\%rash); # kill empty key print Data::Dumper->new([\%rash])->Indent(1)->Useqq(1)->Dump,"\n"; my $rash = reorder_terms(\%rash); # consumes %rash print Data::Dumper->new([$rash])->Indent(1)->Useqq(1)->Dump,"\n"; } sub reorder_terms { my( $ref ) = @_; if( $$ref{TERM}){ my @term; my @op; for my $t( @{$$ref{TERM}} ){ if( ref $t ){ if( $$t{OP} ){ push @op, delete $$t{OP}; }elsif( $$t{LIST} ){ push @term, reorder_terms(delete $$t{LIST} ); }else{ die "uh oh, no OP or LIST key"; } } else { push @term, $t; } } undef %$ref; #return [@op, @term ]; return [$op[0], @term ]; } die "uh oh, no TERM key"; } sub kek { my ($ref) = @_; my $typ = ref $ref; if( $typ eq 'HASH'){ delete $$ref{""}; for my $val( values %$ref){ ref $val and kek($val); } } if( $typ eq 'ARRAY'){ for my $val( @$ref){ ref $val and kek($val); } } return; } __END__ $VAR1 = { "" => "dogs OR cats OR \"flying fish\" OR (shrimp AND squid)", "TERM" => [ "dogs", { "" => " OR", "OP" => "OR" }, "cats", { "" => " OR", "OP" => "OR" }, "\"flying fish\"", { "" => " OR", "OP" => "OR" }, { "" => " (shrimp AND squid)", "LIST" => { "" => "(shrimp AND squid)", "TERM" => [ "shrimp", { "" => " AND", "OP" => "AND" }, "squid" ] } } ] }; $VAR1 = { "TERM" => [ "dogs", { "OP" => "OR" }, "cats", { "OP" => "OR" }, "\"flying fish\"", { "OP" => "OR" }, { "LIST" => { "TERM" => [ "shrimp", { "OP" => "AND" }, "squid" ] } } ] }; $VAR1 = [ "OR", "dogs", "cats", "\"flying fish\"", [ "AND", "shrimp", "squid" ] ];
Uncomment <logfile: - > for some debug. See also KinoSearch::Docs::Cookbook::CustomQueryParser, Text::Query.

Replies are listed 'Best First'.
Re^3: Break string into array
by ikegami (Patriarch) on Sep 18, 2009 at 14:59 UTC

    Not only does your solution allow

    my $s = q[dogs OR cats AND "flying fish" OR (shrimp AND squid)];
    it parses to the same as
    my $s = q[dogs OR cats OR "flying fish" OR (shrimp AND squid)];

    And it also allows

    my $s = q[OR dogs OR cats OR "flying fish" OR (shrimp AND squid)];

    Finally, quoted strings are left quoted. A parser shouldn't return literals. If you want to differentiate between quoted and unquoted terms, you'll need to add to the parse tree.

    $VAR1 = [ "OR", [ term => "dogs" ], [ term => "cats" ], [ phrase => "flying fish" ], [ "AND", [ term => "shrimp" ], [ term => "squid" ], ] ];
      That is all true, but it does satisfy the OPs requirement/example :)
        No. Allowing AND and OR at the same level is explicitly disallowed by his requirements. It's clearly wrong to use AND when the string contains OR and vice-versa, even if it wasn't explicitly stated. I suppose there's no harm is allowing a leading OR or AND. Finally, the parser also didn't do it's job wrt quoted strings. A parser that doesn't parse is buggy.