jmr4096 has asked for the wisdom of the Perl Monks concerning the following question:

I have written a TNSNAMES.ORA parser using Parse::Recdescent. It works mostly but I have encountered a couple of problems with it. I am hoping that once it is complete, others will find the script useful.

Here is the source code

#!/usr/bin/perl use Parse::RecDescent; my $debug = 0; $::RD_HINT = 1; $::RD_WARN = 1; $::RD_TRACE = 1 if $debug; my $tnsGrammar = q{ startrule : Comment | SkipLine | TNSEntry Comment : /\#.*$/ SkipLine : /^(\s*)$/ TNSEntry : NetServiceName '=' Description TNSEntry : NetServiceName '=' DescriptionList DescriptionList : '(' /description_list/i '=' Description(s) Failover( +?) LoadBalance(?) SourceRoute(?) ')' Description : '(' /description/i '=' AddressList ConnectData( +?) Failover(?) LoadBalance(?) SourceRoute(?) TypeService(?) HS(?) ')' Description : '(' /description/i '=' Address ConnectData( +?) Failover(?) LoadBalance(?) SessionData(?) TransData(?) TypeService +(?) HS(?) ')' # Various Types of Addresses AddressList : '(' /address_list/i '=' Address(s) Failover(?) LoadBalan +ce(?) SourceRoute(?) ')' Address : '(' /address/i '=' AddressTCP ')' Address : '(' /address/i '=' AddressIPC ')' Address : '(' /address/i '=' AddressSPX ')' Address : '(' /address/i '=' AddressPipe ')' # TCP Protocol AddressTCP : Community(?) ProtocolTCP Host Port Community : '(' /community/i '=' DomainName(?) ')' Host : '(' /host/i '=' Hostname ')' Hostname : DomainName | IP IP : /[0-9]+.[0-9]+.[0-9]+.[0-9]/ Port : '(' /port/i '=' /\d+/ ')' # IPC Protocol AddressIPC : ProtocolIPC Key Key : '(' /key/i '=' /\w+/ ')' # SPX Protocol AddressSPX : ProtocolSPX Service Service : '(' /service/i '=' /\w+/ ')' # Pipe Protocol AddressPipe : ProtocolPipe Server Pipe Server : '(' /service/i '=' /\w+/ ')' Pipe : '(' /pipe/i '=' /\w+/ ')' ProtocolTCP : '(' /protocol/i '=' /tcp/i ')' ProtocolIPC : '(' /protocol/i '=' /ipc/i ')' ProtocolSPX : '(' /protocol/i '=' /spx/i ')' ProtocolPipe : '(' /protocol/i '=' /pipe/i ')' # Connect Data Information ConnectData: '(' /connect_data/i '=' SID(?) Serve(?) FailoverMode(? +) GlobalDBName(?) HS(?) InstanceName(?) RDBDatabase(?) ServiceName(?) + Presentation(?) ')' ConnectData: '(' /connect_data/i '=' SID(?) ServiceName(?) Serve(?) + FailoverMode(?) GlobalDBName(?) HS(?) InstanceName(?) RDBDatabase(?) + Presentation(?) ')' ConnectData: '(' /connect_data/i '=' FailoverMode(?) GlobalDBName(? +) HS(?) InstanceName(?) RDBDatabase(?) Serve(?) ServiceName(?) SID(?) + Presentation(?) ')' FailoverMode: Backup(?) Type Method Retries(?) Delay(?) Backup: '(' /backup/i '=' /\w+/ ')' Type: '(' /type/i '=' /session|select|none/i ')' Method: '(' /method/i '=' /basic|preconnect/i ')' Retries: '(' /retries/i '=' /\d+/ ')' Delay: '(' /delay/i '=' /\d+/ ')' GlobalDBName: '(' /global_name/i '=' Hostname ')' HS: '(' /hs/i '=' /(ok)*/i ')' InstanceName: '(' /instance_name/i '=' /\w+/ ')' RDBDatabase: '(' /rdb_database/i '=' /\w+/ ')' Serve: '(' /server/i '=' /dedicated|shared/i ')' ServiceName: '(' /service_name/i '=' Hostname ')' SID: '(' /sid/i '=' /[a-z0-9\-\_]+/i ')' Presentation: '(' /presentation/i '=' /\w+/ ')' Failover: '(' /failover/i '=' /on|off|yes|no|true|false/i ') +' LoadBalance: '(' /load_balance/i '=' /on|off|yes|no|true|false/i ') +' SourceRoute: '(' /source_route/i '=' /on|off|yes|no/i ')' SessionData: '(' /sdu/i '=' /\d+/ ')' TransData: '(' /tdu/i '=' /\d+/ ')' TypeService: '(' /type_of_service/i '=' /rdb_database|oracle8_datab +ase/i ')' NetServiceName: DomainName DomainName: /([a-z0-9\_\-.]+)/i }; my $tnsParser = Parse::RecDescent->new( $tnsGrammar ); my $entry = ""; my $lineCount = 0; my $startLine = 1; while (my $inputLine = <>) { $lineCount++; next if ($inputLine =~ /^#.*$/); next if ($inputLine =~ /^\s*$/); if (($inputLine =~ /^[a-z0-9]/i) and ($entry ne "")) { testEntry( $entry ); $entry = ""; $startLine = $lineCount; } $entry .= $inputLine; } testEntry( $entry) if ($entry ne ""); sub testEntry { my $entry = shift; if (! $tnsParser->startrule( $entry )) { print "Invalid Entry on Lines [ $startLine - $lineCount ]\n" if ! +$debug; print "$entry\n"; } }

I have found two problems which I am not sure how to handle.

Thanks for any help.

Replies are listed 'Best First'.
Re: TNSNAMES.ORA and Recdescent
by davido (Cardinal) on Feb 19, 2004 at 21:03 UTC
    I'm not intimately familiar with the Parse::RecDescent module, but here is one thing I noticed which could relate to your IP and Hostnames problem:

    You said that "It will mistake an IP address like 199.92.100.35.100 as a valid IP." The code I'm looking at that you wrote is like this:

    IP : /[0-9]+.[0-9]+.[0-9]+.[0-9]/

    I have to assume that the part there which looks like an RE is, indeed, a Perl-like regular expression. That being the case, your first problem is that "." has special meaning in an RE: It matches anything except the newline character. So [0-9]+. means match one or more numeric digits followed by any character. Well, that's probably not what you really want. So at minimum, escape those dot characters within the RE: [0-9]+\.

    The next thing is that your RE isn't rejecting items that contain MORE than what you're trying to match. The Owls book (O'Reilly: Mastering Regular Expressions, by Friedl) presents the following regexp for matching IP addresses:

    /^([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]? +\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])$/

    Note: I added the $ to the end of the RE so that you're not passing strings that contain anything after the IP address. That may or may not be necessary in your case. Hopefully that will be robust enough to help.


    Dave

      That's a cool regex, but IP addresses can be just a little more complicated than that. There's a bad old standard you might need to take into account:

      Just like in IPv6, you can omit octets in the middle if they are zero, for example:

      127.1 means 127.0.0.1
      10.40.30 means 10.40.0.30

      And it gets worse. 192.168.288 is a perfectly legal IP address, most people would write it as 192.168.1.32. Or 10.258 will get you to 10.0.1.2. The rule seems to be that the last decimal number in the address is first spread out into octets, then the remaining octets are dropped in from left to right, starting leftmost. This is the source of the classic trick of getting around IP and name based web filters by entering the 'one huge decimal number' version of the IP address in a browser (I don't think it works on modern browsers that try to be smart about DNS lookups.)

      Deranged? Yes. But they're out there, and they work on all NT-based OSs and every *nix I've been able to try it on.

      If the underlying system (Oracle, I presume) only supports sane IP addresses, you're cool. I don't know if there would be a way to write a regex for these odd cases. Monks?

      --
      Spring: Forces, Coiled Again!
        The question is what standard allows this. Some Unix implementation allows the extra formats in inet_aton. This is visible with Perl and web browsers. Most people would consider the short-form addresses to be errors and only four decimal components to be valid addresses. It makes sense to reject or ignore long form addresses to keep from confusing people or other software.

        It is possible to restrict the range of numbers with a more detail regular expression:

        /^([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-9])$/
        Also keep in mind that PRD rules are more than just regex. Nothing wrong with something like this:
        IP: /([\d.]+)/ { use Socket; inet_aton($1) and $1 }
        This takes any digit/dot string and calls inet_aton on it, and if it passes, accepts the value. Otherwise, rejects the value. PRD rocks.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

        Thanks for the responses on the REGEX for IP addresses. I had two motivations for writing this script. Using it for work was/is my primary concern BUT I would like it to be as robust as possible so that others could use it. Fortunately at work, we use the standard IP4 with no omitted octets. BUT I am interested in trying to get it to work with IP6 and the shorthand form of the IP address.

        I plan to immediately implement the regex that was supplied earlier BUT the problem I saw was with my hostname regular expression.

        Couldn't you have a hostname like 123.empire.com? Is it possible to have 123.224.empire.com? I am guessing that what I will need to do is make sure that last octet contains [a-zA-z].

Re: TNSNAMES.ORA and Recdescent
by linux454 (Pilgrim) on Feb 20, 2004 at 16:54 UTC
    First some general advise. I personally prefer alternations instead of multiple rule definitions, if only for the fact that it makes the grammar look more BNF-ish (which is easy for me to read) Thus:

    Address : '(' /address/i '=' AddressTCP ')' Address : '(' /address/i '=' AddressIPC ')' Address : '(' /address/i '=' AddressSPX ')' Address : '(' /address/i '=' AddressPipe ')'

    Would become:

    Address: '(' /address/i '=' (AddressTCP | AddressIPC | AddressSPX | Ad +dressPipe ) ')'

    Or better yet:

    Address: '(' /address/i '=' AddressProtocol ')' AddressProtocol: AddressTCP | AddressIPC | AddressSPX | AddressPipe

    Also:

    TNSEntry : NetServiceName '=' Description TNSEntry : NetServiceName '=' DescriptionList

    Would become:

    TNSEntry: NetServiceName '=' (Description | DescriptionList)

    Note: In this last example order can matter in some situtations, you'll just have to play with it.

    Now as far as:
    The second problem is tied to the order of the clauses to be matched. If I take the following line:
    AddressTCP : Community(?) ProtocolTCP Host Port
    It will match the code only if they come in that exact order. There are no rules stating that Community needs to be first and Protocol needs to be second ... The obvious solution is to create create every permutation of this line. My question is "Is there an easier way?" What could be a relatively small program could otherwise become quite unwieldly.

    I guess I don't understand what you are talking about here. The rule that you have states: AddressTCP by definition is: An optional Community followed by required Protocol, Host, and Port. So yes order does matter here. The rule does require that if a community appears, it must come before the protocol. I don't know enough about the TNSNAMES.ORA file to divine the intended meaning. A clearer description of the exact syntax of this line may help.

    Also are you parsing each line individually just so you can report which lines contain errors? If so you can accomplish the same thing by letting Parse::RecDescent parse the entire input for you. There's no need to parse the document before you "parse" the document. Anyway that's just my $.02 I could be wrong.

      Sorry, didn't realize the ambiguity in my question. What I am trying to do is check the syntax of tnsnames.ora entries. This file contains information about how to connect to an oracle database.

      The code: AddressTCP : Community(?) ProtocolTCP Host Port defines the particular order which the entry must follow for it to match. BUT that is not the action I am trying to achieve. The entry could come in any order so I would need to include the following to match every possible order.

      AddressTCP : Community(?) ProtocolTCP Host Port AddressTCP : Community(?) ProtocolTCP Port Host AddressTCP : Community(?) Host ProtocolTCP Port AddressTCP : Community(?) Port ProtocolTCP Host AddressTCP : Community(?) Host ProtocolTCP Port AddressTCP : Community(?) Host Port ProtocolTCP AddressTCP : Community(?) Port Host ProtocolTCP .... Would include every combination of each of these four fields.

      I picked AddressTCP as an example for the question but most of the fields on the parser could have this problem. All of a sudden this problem becomes unmanagle.Thanks for the help so far.

      Here is a sample TNSNAMES.ORA entry

      PRDSAP01 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = mercury.empire.com)(PORT = 15 +21)) ) (CONNECT_DATA = (SERVICE_NAME = PRDSAP01) ) )