in reply to refine regex

The process of unifying two regular expressions can be done in the following steps:

  1. Start with an empty regex : m!!
  2. Extract the common prefix : m! NOT !
  3. Extract the alternating parts A and B from both regexes, and put it in between (?:A|B) : m! NOT (?:".*?"\[MESH\]|\(.*?\))
  4. Go to step 2 if there is more of the regex(es) left : m! NOT (?:".*?"\[MESH\]|\(.*?\)) ?!
  5. Compare your new regex against the old regular expressions to confirm that it matches exactly what the old ones matched.

That way, I end up with m! NOT (?:".*?"\[MESH\]|\(.*?\)) ?!.

Replies are listed 'Best First'.
Re^2: refine regex
by fglock (Vicar) on Nov 26, 2004 at 11:50 UTC

    step 6. Test it! - it was necessary to add a small fix:

    #/usr/bin/perl use strict; use warnings; my $string_original =<< "END"; ("Immunologic and Biological Factors"[MESH] OR "Immunosuppressive Agen +ts"[MESH] OR "Transplantation Immunology"[MESH] OR "Allergy and Immun +ology"[MESH] OR "Graft vs Host Disease"[MESH]) NOT ("Foo"[MESH] OR "B +ar"[MESH]) AND ("Kidney Transplantation"[MESH] OR "Liver Transplantat +ion"[MESH] OR "Heart Transplantation"[MESH]) NOT ("My Term"[MESH] OR +"Blah"[MESH]) NOT "foobar"[MESH] END # original { my $string = $string_original; $string =~ s/ NOT ".*?"\[MESH\] ?//g; $string =~ s/ NOT \(.*?\) ?//g; print $string, "\n"; } # Corion's { my $string = $string_original; $string =~ s! NOT (?:".*?"\[MESH\]|\(.*?\)) ?!!g; print $string, "\n"; } # fixed { my $string = $string_original; $string =~ s! ?NOT (?:".*?"\[MESH\]|\(.*?\)) ?!!g; print $string, "\n"; }

      Don't need the trailing ' ?'.
      $string =~ s! ?NOT (?:".*?"\[MESH\]|\(.*?\))!!g;
      works fine.

      yeah, i picked that up to when i tested :)
      Cheers.
Re^2: refine regex
by rsiedl (Friar) on Nov 26, 2004 at 11:40 UTC
    Thanks Corion. Great explanation!
Re^2: refine regex
by exussum0 (Vicar) on Nov 27, 2004 at 01:20 UTC
    Or you can convert it to an NFA and remove useless transitions etc ;) But let's not get into language theory. Good job.

    ----
    Then B.I. said, "Hov' remind yourself nobody built like you, you designed yourself"