in reply to TNSNAMES.ORA and Recdescent

I'm not intimately familiar with the Parse::RecDescent module, but here is one thing I noticed which could relate to your IP and Hostnames problem:

You said that "It will mistake an IP address like 199.92.100.35.100 as a valid IP." The code I'm looking at that you wrote is like this:

IP : /[0-9]+.[0-9]+.[0-9]+.[0-9]/

I have to assume that the part there which looks like an RE is, indeed, a Perl-like regular expression. That being the case, your first problem is that "." has special meaning in an RE: It matches anything except the newline character. So [0-9]+. means match one or more numeric digits followed by any character. Well, that's probably not what you really want. So at minimum, escape those dot characters within the RE: [0-9]+\.

The next thing is that your RE isn't rejecting items that contain MORE than what you're trying to match. The Owls book (O'Reilly: Mastering Regular Expressions, by Friedl) presents the following regexp for matching IP addresses:

/^([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]? +\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])$/

Note: I added the $ to the end of the RE so that you're not passing strings that contain anything after the IP address. That may or may not be necessary in your case. Hopefully that will be robust enough to help.


Dave

Replies are listed 'Best First'.
Re: Re: TNSNAMES.ORA and Recdescent
by paulbort (Hermit) on Feb 19, 2004 at 23:12 UTC
    That's a cool regex, but IP addresses can be just a little more complicated than that. There's a bad old standard you might need to take into account:

    Just like in IPv6, you can omit octets in the middle if they are zero, for example:

    127.1 means 127.0.0.1
    10.40.30 means 10.40.0.30

    And it gets worse. 192.168.288 is a perfectly legal IP address, most people would write it as 192.168.1.32. Or 10.258 will get you to 10.0.1.2. The rule seems to be that the last decimal number in the address is first spread out into octets, then the remaining octets are dropped in from left to right, starting leftmost. This is the source of the classic trick of getting around IP and name based web filters by entering the 'one huge decimal number' version of the IP address in a browser (I don't think it works on modern browsers that try to be smart about DNS lookups.)

    Deranged? Yes. But they're out there, and they work on all NT-based OSs and every *nix I've been able to try it on.

    If the underlying system (Oracle, I presume) only supports sane IP addresses, you're cool. I don't know if there would be a way to write a regex for these odd cases. Monks?

    --
    Spring: Forces, Coiled Again!
      The question is what standard allows this. Some Unix implementation allows the extra formats in inet_aton. This is visible with Perl and web browsers. Most people would consider the short-form addresses to be errors and only four decimal components to be valid addresses. It makes sense to reject or ignore long form addresses to keep from confusing people or other software.

      It is possible to restrict the range of numbers with a more detail regular expression:

      /^([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])$/
      Also keep in mind that PRD rules are more than just regex. Nothing wrong with something like this:
      IP: /([\d.]+)/ { use Socket; inet_aton($1) and $1 }
      This takes any digit/dot string and calls inet_aton on it, and if it passes, accepts the value. Otherwise, rejects the value. PRD rocks.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

      Thanks for the responses on the REGEX for IP addresses. I had two motivations for writing this script. Using it for work was/is my primary concern BUT I would like it to be as robust as possible so that others could use it. Fortunately at work, we use the standard IP4 with no omitted octets. BUT I am interested in trying to get it to work with IP6 and the shorthand form of the IP address.

      I plan to immediately implement the regex that was supplied earlier BUT the problem I saw was with my hostname regular expression.

      Couldn't you have a hostname like 123.empire.com? Is it possible to have 123.224.empire.com? I am guessing that what I will need to do is make sure that last octet contains [a-zA-z].