in reply to Re^3: Regex with multiple pattern omissions
in thread Regex with multiple pattern omissions

Is this headed in the right direction?
Meaning does this code produce the output that you want? (given your single test case)...Program specifications are hard to write and I think the best way forward here is just refinement by example.
#!/usr/bin/perl -w use strict; my $input = '"non$volatile display" and ((timer oR count$3 Or display) + near5 hour).ccls. NOT (LCD).ab.'; my @omissions = qw(terms and or not with near same xor adj); my $omit = join("|",@omissions); $input =~ s/\..*?\.//g; $input =~ s/$omit//ig; my @searchterms = ($input =~ m/".+?"|[a-zA-Z][\w\$]+/gi); print "@searchterms"; #prints: #"non$volatile display" timer count$3 display hour LCD
update:
-I think that you mean for these .xyz. terms to be deleted?
-Above doesn't allow for terms in @omissions to be taken absolutely literally. I need to look in Larry's book for the syntax. But this does show a dynamic regex. Also probably need to take into account that omit words should be on boundaries (whole words - not words within words, the \b - look in Larry's book)
-Running substitute operations can take some time as the string is modified after each one, but this may or may not matter time-wise.
-The question right now: is this is "right" output? I mean for this given single input case?

As a general approach, I try to break these complex things into multiple easier steps. Get the right output, then tweak it if performance is not adequate.

Replies are listed 'Best First'.
Re^5: Regex with multiple pattern omissions
by jhoop (Acolyte) on Jan 10, 2011 at 14:28 UTC
    initial success! I need to do some studying to try and figure out how this works, but it does appear to do what I need it to. The only issue would be that the "" are included in the output around "non$volatile display" and so it appears first in my list after sorting.. I'll work to decipher your method and tweak as necessary. Thank you!