Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I've created several regular expressions that work well with the exception of the two below:
expression: (?<!hw\_)(?=\s*Rev\.?(?:ision)?:?\s*([^\s;,\n\r]+)) returns: 'ersion' from: Company: Nuera Communications, Inc., ProductFamily: ORCA Series, P +roduct: RDT-8, SoftwareVersion: rdtg7.0.4.7, HardwareRevision: A expression: (?:s\/w)?Ver:?\s*([^\s;,\n\r>]+) returns: 'tical ' from: Vertical Horizon VH-2402-L3 # I need to ignore the string if it contains 'Vertical' # I tried (?<!Vertical)(?:s\/w)?Ver:?\s*([^\s;,\n\r>]+) # but it still returns tical, and I've tried setting the # space to one or more but that breaks other comparisons so # I can't do that.
I have thousands of lines I'm parsing (all different). Any suggestions?

Replies are listed 'Best First'.
Re: RegEx revision needed
by matija (Priest) on Feb 23, 2004 at 14:37 UTC
    To answer your second question first: you're doing a negative lookbehind (Horizon NOT preceeded by vertical).

    perldoc perlre has this to say about negative lookbehind:

                     If you are looking for a "bar" that isn't preceded by a
                     "foo", "/(?!foo)bar/" will not do what you want.  That's
                     because the "(?!foo)" is just saying that the next thing can-
                     not be "foo"--and it's not, it's a "bar", so "foobar" will
                     match.  You would have to do something like "/(?!foo)...bar/"
                     for that.   We say "like" because there's the case of your
                     "bar" not having three characters before it.  You could cover
                     that this way: "/(?:(?!foo)...|^.{0,2})bar/".  Sometimes it's
                     still easier just to say:
    
                         if (/bar/ && $` !~ /foo$/)
    
    As for your first regexp, first it appears to me that you are running it with /i, because I can't see how it could match otherwise. Also, you have too many fields that can appear once or never (the ? muliplier). That says to me that you are trying to fit too many different possibilities into one regular expression.

    You should either break that regular expression into two or more, or find some text that is always there to 'anchor' the regexp engine.

Re: RegEx revision needed
by ysth (Canon) on Feb 23, 2004 at 18:40 UTC
    Looks like your using //i so the "Rev" is matching case insensitively. You can override that for part of your match by saying e.g. (?-i:R)ev to require R to be capitalized. Or you can figure out more precisely what you are expecting between the Rev(?:ision)? and the part you want to capture. Changing ([^\s;,\n\r]+) to (\b[^\s;,\n\r]+) (or using some negative or positive lookbehind instead of \b) may do what you need.

    It looks like your regex has grown in complexity over time, adding more and more cases. This can create a maintenance nightmare; it's often better just to write out:

    if (/case1/) { $capture = $1 } elsif (/case2/) { $capture = $7 } ...
    or if you need to force leftmost match, like this
    ($capture) = grep defined, / (?:case1) | (?:case2) | .../x;
    (assuming each case does a separate capture of the part you want).
Re: RegEx revision needed
by ambrus (Abbot) on Feb 23, 2004 at 14:34 UTC

    I dont undestand what is the role of the lookahead in the first regex. This seems to be overcomplicated IMHO.

      The lookahead actually applies to other strings I'm parsing and works well. I've solved my second portion of the question and I'm close to solving the first problem.

      Thanks anyway!

Re: RegEx revision needed
by Anonymous Monk on Feb 23, 2004 at 14:16 UTC
    sorry, I forgot to mention that I'm parsing out the Firmware revision. (i.e.)
    Company: Nuera Communications, Inc., ProductFamily: ORCA Series, P +roduct: RDT-8, SoftwareVersion: rdtg7.0.4.7, HardwareRevision: A should return: rdtg7.0.4.7 and Vertical Horizon VH-2402-L3 should return nothing

      How about something as simple as:

      #!/usr/bin/perl my $text = "Nuera Communications, Inc., ProductFamily: ORCA Series, Pr +oduct: RDT-8, SoftwareVersion: rdtg7.0.4.7, HardwareRevision:"; $text =~ /SoftwareVersion: (.+),/ig; print $1, "\n";

      Assuming what you are looking for always starts with the same text. In this case, replacing the Nuera Communications... with Vertical Horizon... does return nothing. Maybe I'm not fully understanding your problem but if it starts with an identifier (like SoftwareVersion) and then is followed by what you want, I think this is the easiest way to get it.

Re: RegEx revision needed
by TomDLux (Vicar) on Feb 25, 2004 at 04:55 UTC

    How about:

    next if /Vertical/; if ( /Ver # Find string beginning with 'Ver' [^\s]* # and continuing to next space char. \s* # Skip over the space ([^\s;,\n\r>]+) # Capture next 'word': any characters # other than space, newline, punct. /x ) { $version = $1; # Save captured text, if found. }

    --
    TTTATCGGTCGTTATATAGATGTTTGCA