in reply to Re: Regex with multiple pattern omissions
in thread Regex with multiple pattern omissions

the immediate issue is that, in the current output given - oR, Or, and NOT should be omitted (in this case "and" is the only control-word from the input string that was correctly omitted) and also, "non" and "volatile" should remain together in the output
  • Comment on Re^2: Regex with multiple pattern omissions

Replies are listed 'Best First'.
Re^3: Regex with multiple pattern omissions
by jhoop (Acolyte) on Jan 09, 2011 at 01:08 UTC
    eep. i meant "non$volatile display" should remain together in the output
      Is this headed in the right direction?
      Meaning does this code produce the output that you want? (given your single test case)...Program specifications are hard to write and I think the best way forward here is just refinement by example.
      #!/usr/bin/perl -w use strict; my $input = '"non$volatile display" and ((timer oR count$3 Or display) + near5 hour).ccls. NOT (LCD).ab.'; my @omissions = qw(terms and or not with near same xor adj); my $omit = join("|",@omissions); $input =~ s/\..*?\.//g; $input =~ s/$omit//ig; my @searchterms = ($input =~ m/".+?"|[a-zA-Z][\w\$]+/gi); print "@searchterms"; #prints: #"non$volatile display" timer count$3 display hour LCD
      update:
      -I think that you mean for these .xyz. terms to be deleted?
      -Above doesn't allow for terms in @omissions to be taken absolutely literally. I need to look in Larry's book for the syntax. But this does show a dynamic regex. Also probably need to take into account that omit words should be on boundaries (whole words - not words within words, the \b - look in Larry's book)
      -Running substitute operations can take some time as the string is modified after each one, but this may or may not matter time-wise.
      -The question right now: is this is "right" output? I mean for this given single input case?

      As a general approach, I try to break these complex things into multiple easier steps. Get the right output, then tweak it if performance is not adequate.

        initial success! I need to do some studying to try and figure out how this works, but it does appear to do what I need it to. The only issue would be that the "" are included in the output around "non$volatile display" and so it appears first in my list after sorting.. I'll work to decipher your method and tweak as necessary. Thank you!