in reply to Implementing a text filter on some dataset

Cos the problem intrigued me :)

#! perl -slw use strict; ## Convert query syntax to evalable Perl code sub buildQuery { local $_ = uc shift; while( m[ \( (?: AND | OR ) | != | = ]x ) { s< ( [^!=()\s]+ ) ( != | = ) ( [^()\s]+ ) >{ my $op = ( $2 eq '!=' ? 'ne' : 'eq' ); "[\$\L$1\E $op '$3']"; }xge; s< \( \s* ( AND ) \s+ ( \[ [^()]+ \] ) \s+ ( \[ [^()]+ \] ) \s +* \) > { "[$2 && $3]" }xge; s< \( \s* ( OR ) \s+ ( \[ [^()]+ \] ) \s+ ( \[ [^()]+ \] ) \s* + \) > { "[$2 || $3]" }xge; } tr/[]/()/; return $_; } ## Read data and UPPER case my $data = do{ local $/; uc <DATA> }; ## Some variables my( $author, $profit, $publisher, $book ); ## And a regex to populate them from each record my $re = qr[ PAGE \s+ \d+ \s+ AUTHOR: \s+ ( [^\n]+ ) (?{ $author = $^N }) \s+ PROFIT: \s+ ( [^\n]+ ) (?{ $profit = $^N }) \s+ PUBLISHER: \s+ ( [^\n]+ ) (?{ $publisher = $^N }) \s+ BOOK: \s+ ( [^\n]+ ) (?{ $book = $^N }) \s+ ]x; ## Covert the query NOTE: (AND ) syntax is required ## where example used implicit AND my $query = buildQuery( <<EOQ ); (AND (OR (AND AUTHOR=John PROFIT=90% ) (AND AUTHOR=Matt PROFIT=80% ) ) PUBLISHER=OReilly ) EOQ print "\nQuery: $query\n"; ## Test the condition and print the record if it matches ## For each record eval "$query" and print $1 while $data =~ m[ ( $re ) ]xg; ## Same again for another query ## Note != also accepted. my $query2 = buildQuery( <<EOQ ); (OR (AND AUTHOR=John PROFIT!=90% ) (AND AUTHOR=Matt PUBLISHER!=OReilly ) ) EOQ print "\nQuery: $query2\n"; eval "$query2" and print $1 while $data =~ m[ ( $re ) ]xg; __DATA__ Page 1 AUTHOR: John PROFIT: 20% PUBLISHER: TMH BOOK: OPERATING SYSTEMS Page 2 AUTHOR: John PROFIT: 90% PUBLISHER: OREILLY BOOK: ALGORITHMS Page 3 AUTHOR: Matt PROFIT: 80% PUBLISHER: TMH BOOK: COMPUTER NETWORKS Page 4 AUTHOR: Matt PROFIT: 80% PUBLISHER: OREILLY BOOK: COMMUNICATION SYSTEMS

Outputs:

[ 9:25:03.05]C:\test>670477 Query: (((($author eq 'JOHN') && ($profit eq '90%')) || (($author eq ' +MATT') && ($profit eq '80%'))) && ($publisher eq 'OREILLY')) PAGE 2 AUTHOR: JOHN PROFIT: 90% PUBLISHER: OREILLY BOOK: ALGORITHMS PAGE 4 AUTHOR: MATT PROFIT: 80% PUBLISHER: OREILLY BOOK: COMMUNICATION SYSTEMS Query: ((($author eq 'JOHN') && ($profit ne '90%')) || (($author eq 'M +ATT') && ($publisher ne 'OREILLY'))) PAGE 1 AUTHOR: JOHN PROFIT: 20% PUBLISHER: TMH BOOK: OPERATING SYSTEMS PAGE 3 AUTHOR: MATT PROFIT: 80% PUBLISHER: TMH BOOK: COMPUTER NETWORKS

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Implementing a text filter on some dataset
by grizzley (Chaplain) on Feb 27, 2008 at 13:06 UTC

    Bravo! :)

    Another approach: assuming, that we can "upgrade" filter a little

    $_=<<END_OF_FILTER; (AND (OR (AND (COND AUTHOR=John) (COND PROFIT=90%) ) (AND (COND AUTHOR=Matt) (COND PROFIT=80%) ) ) (COND PUBLISHER=OReilly) ) END_OF_FILTER %hash = ( AUTHOR=> 'John', PROFIT=> '90%', PUBLISHER=> 'OReilly', BOOK=> 'OPERATING SYSTEMS' ); sub AND { print "and debug: [@_]"; for(@_) { return 0 if ! $_ } return 1 } sub OR { print "or debug: [@_]"; for(@_) { return 1 if $_ } return 0 } sub COND { my ($key, $value)=@_; $ret = 0 | ($hash{$key} eq $value); print "cond debug: @_ -> $ret"; return $ret } # add comas between ) and ( braces s/\)(\s*)\(/),$1(/g; # convert COND a little - I hope you do not have O'reilly name... :) s/\(COND (\w+)=([^)]+)\)/(COND '$1','$2')/g; # we have proper parens already, move it only s/\((AND|OR|COND)\b/$1(/g; # see how it looks now print $_; # and voila! print "ok" if eval;

    Update: The question is only how to add (COND ...) block with one regexp to original filter? Search for '=' like this?

    s/\w+=\S+/(COND $&)/g

    Or there can be whitespace in some values, on which regexp fails? As only legend knows the format of conditions, the answer and a good regexp is in the legend :)