Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,
1. We have a string "Olympus xModel 3.1mp".. In this
string we want the REGEX to remove the chars mp (i.e.
characters appearing after 3.1) and not the "mp" that is
part of the brand Olympus.

2. We have another string "Olympus yModel 3x Zoom
Optical 5x Zoom Digital". As you can see Zoom appears
twice in this string. In this we want to remove the
words "Zoom" and also words preceding both
the "Zoom"s i.e. 3x and 5x. But before removing the 3x and
5x, we also want make sure that the word contains the
character "x" and its all other characters
are "Numeric" only. There also could be a possibility of
more "Zoom" appearing in the string. REGEX should be
applied to remove all the occurences of Zoom and the
word appearing immediately before "Zoom", provided it
contains any numbers (not necessarily 3 or 5) and character "x".


Thanks a lot.

Regards,
Habib

Replies are listed 'Best First'.
Re: REGEX Help
by GrandFather (Saint) on Jun 24, 2006 at 07:59 UTC

    In the first case it is probably sufficient to ensure that the preceeding character was a digit and that the following character is a non-word character so s/(?<=\d)mp\b//g should do that one.

    In the second case it is actually even simpler, no zero width look back assertion required: s/\s+\d+x Zoom//g.

    Update: fixed broken regex. Was s/(><=\d)mp\b//g


    DWIM is Perl's answer to Gödel
      Thanks a lot.
      The second problem got resolved but the first issue still
      persists. What could be missing in this solution


      $original_text="Olympus xModel 3.1mp"; $original_text=~ s/(><=\d)mp\b//g; print "\n$original_text";
      Regards Habib

        The absence of a stupid typo! Sorry, the regex should have been $original_text=~ s/(?<=\d)mp\b//g;.

        By the way I strongly recommend that you add use strict; use warnings; to any script you write. You then need to declare variables using my, but many problems get found before they bite hard.

        You should also browse the following documentation, probably in the order given: perlretut, perlre and perlreref.


        DWIM is Perl's answer to Gödel

        Hi, Your regex $original_text=~ s/(><=\d)mp\b//g; is not complete. You do have have a replacement string. Try this instead. $original_text=~ s/(.+)mp/$1/g;

        Sriram

Re: REGEX Help
by rsriram (Hermit) on Jun 24, 2006 at 09:37 UTC

    Hi, Try this,

    $str="Olympus xModel 3.1mp";
    $str =~ s/(.+)mp/$1/g;
    print $str;

    $str="Olympus yModel 3x Zoom Optical 5x Zoom Digital";
    $str =~ s/([0-9]+)x Zoom//g;
    print $str;

    Sriram

      Your first regex only works in this case because of the quirk in the data that "3.1mp" happens to be at the end of the string. The greediness of (.+) causes it to actually remove the last "mp" in the string, regardless of its context. This also means that if the model is just "3.1" instead of "3.1mp", it will remove the "mp" in "Olympus".

      The second regex is more reliable, but would get false positives if they released a line of cameras with "Super Duper T101x Zoominess(tm)", turning it into "Super Duper Tiness(tm)". It also leaves two consecutive spaces when removing a zoom specification. (Both of these can be fixed by adding a space at the beginning of the regex.)