Hello, thanks to all for the wealth of knowledge here, it has been invaluable.. I have an issue that I can't seem to crack with my limited knowledge or searching.. Trying to parse through search-strings and extract all the non-control words. My input search-string might look like this:

"non$volatile display" and ((timer oR count$3 Or display) near5 hour).ccls. NOT (LCD).ab.

I would like to parse similar strings, extracting the search terms, and ignoring the control words (and their case) and everything between two periods, like .ccls. I would also like to preserve the wildcards and anything in "", like "non$volatile display" (where the $ can be anything, I could just keep the $.. and store anything between "" as a single string in the output. The output would be an array of the extracted substrings. Also, if there is a more efficient way to remove the dupes in this routine, I'm all ears..

My code so far is below - it manages to pull out all the abc substrings, ignoring lower-case control words and anything between periods.. Any thoughts?

sub extract_terms(){ my $input = shift; chomp $input; my @searchterms = ($input =~ m/\b(?!\.)[a-z]+(?!\.)\b/gi); my @omissions = qw(terms and or not with near same xor adj); my %h; @h{@omissions} = undef; @searchterms = grep {not exists $h{$_}} @searchterms; return @searchterms; }

Which outputs (after sorting):

count, display, hour, LCD, non, NOT, Or, oR, timer, volatile,


In reply to Regex with multiple pattern omissions by jhoop

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.