TNSNAMES.ORA and Recdescent

jmr4096 has asked for the wisdom of the Perl Monks concerning the following question:

I have written a TNSNAMES.ORA parser using Parse::Recdescent. It works mostly but I have encountered a couple of problems with it. I am hoping that once it is complete, others will find the script useful.

Here is the source code

#!/usr/bin/perl

use Parse::RecDescent;
my $debug = 0;

$::RD_HINT = 1;
$::RD_WARN = 1;
$::RD_TRACE = 1 if $debug;

my $tnsGrammar = q{

startrule : Comment | SkipLine | TNSEntry

Comment  : /\#.*$/ 
SkipLine : /^(\s*)$/

TNSEntry : NetServiceName '=' Description
TNSEntry : NetServiceName '=' DescriptionList

DescriptionList : '(' /description_list/i '=' Description(s) Failover(
+?) LoadBalance(?) SourceRoute(?) ')'
Description :     '(' /description/i      '=' AddressList ConnectData(
+?) Failover(?) LoadBalance(?) SourceRoute(?) TypeService(?) HS(?) ')'
Description :     '(' /description/i      '=' Address     ConnectData(
+?) Failover(?) LoadBalance(?) SessionData(?) TransData(?) TypeService
+(?) HS(?) ')'


# Various Types of Addresses
AddressList : '(' /address_list/i '=' Address(s) Failover(?) LoadBalan
+ce(?) SourceRoute(?) ')'
Address : '(' /address/i '=' AddressTCP ')'
Address : '(' /address/i '=' AddressIPC ')'
Address : '(' /address/i '=' AddressSPX ')'
Address : '(' /address/i '=' AddressPipe ')'

# TCP Protocol
AddressTCP : Community(?) ProtocolTCP Host Port

Community  : '(' /community/i '=' DomainName(?) ')'
Host       : '(' /host/i      '=' Hostname ')'
Hostname   : DomainName | IP
IP         : /[0-9]+.[0-9]+.[0-9]+.[0-9]/
Port       : '(' /port/i      '=' /\d+/ ')'

# IPC Protocol
AddressIPC : ProtocolIPC Key
Key        : '(' /key/i '=' /\w+/ ')'

# SPX Protocol
AddressSPX : ProtocolSPX Service
Service    : '(' /service/i '=' /\w+/ ')'

# Pipe Protocol
AddressPipe : ProtocolPipe Server Pipe
Server      :  '(' /service/i  '=' /\w+/ ')'
Pipe        :  '(' /pipe/i     '=' /\w+/ ')'

ProtocolTCP  : '(' /protocol/i '=' /tcp/i ')'
ProtocolIPC  : '(' /protocol/i '=' /ipc/i ')'
ProtocolSPX  : '(' /protocol/i '=' /spx/i ')'
ProtocolPipe : '(' /protocol/i '=' /pipe/i ')'

# Connect Data Information
ConnectData:    '(' /connect_data/i '=' SID(?) Serve(?) FailoverMode(?
+) GlobalDBName(?) HS(?) InstanceName(?) RDBDatabase(?) ServiceName(?)
+ Presentation(?) ')'
ConnectData:    '(' /connect_data/i '=' SID(?) ServiceName(?) Serve(?)
+ FailoverMode(?) GlobalDBName(?) HS(?) InstanceName(?) RDBDatabase(?)
+ Presentation(?) ')'
ConnectData:    '(' /connect_data/i '=' FailoverMode(?) GlobalDBName(?
+) HS(?) InstanceName(?) RDBDatabase(?) Serve(?) ServiceName(?) SID(?)
+ Presentation(?) ')'

FailoverMode:   Backup(?) Type Method Retries(?) Delay(?)
Backup:         '(' /backup/i  '=' /\w+/ ')'
Type:           '(' /type/i    '=' /session|select|none/i ')'
Method:         '(' /method/i  '=' /basic|preconnect/i ')'
Retries:        '(' /retries/i '=' /\d+/ ')'
Delay:          '(' /delay/i   '=' /\d+/ ')'

GlobalDBName:   '(' /global_name/i   '=' Hostname ')'
HS:             '(' /hs/i            '=' /(ok)*/i ')'
InstanceName:   '(' /instance_name/i '=' /\w+/ ')'
RDBDatabase:    '(' /rdb_database/i  '=' /\w+/ ')'
Serve:          '(' /server/i        '=' /dedicated|shared/i ')'
ServiceName:    '(' /service_name/i  '=' Hostname ')'
SID:            '(' /sid/i           '=' /[a-z0-9\-\_]+/i ')'
Presentation:   '(' /presentation/i  '=' /\w+/ ')'

Failover:       '(' /failover/i     '=' /on|off|yes|no|true|false/i ')
+'
LoadBalance:    '(' /load_balance/i '=' /on|off|yes|no|true|false/i ')
+'
SourceRoute:    '(' /source_route/i '=' /on|off|yes|no/i ')'

SessionData:    '(' /sdu/i '=' /\d+/ ')'
TransData:      '(' /tdu/i '=' /\d+/ ')'
TypeService:    '(' /type_of_service/i '=' /rdb_database|oracle8_datab
+ase/i ')'

NetServiceName: DomainName
DomainName:     /([a-z0-9\_\-.]+)/i

};

my $tnsParser = Parse::RecDescent->new( $tnsGrammar );

my $entry = "";
my $lineCount = 0;
my $startLine = 1;
while (my $inputLine = <>)
{
  $lineCount++;
  next if ($inputLine =~ /^#.*$/);
  next if ($inputLine =~ /^\s*$/);
  
  if (($inputLine =~ /^[a-z0-9]/i) and ($entry ne ""))
  {
    testEntry( $entry );
    $entry = "";    
    $startLine = $lineCount;
  }
  
  $entry .= $inputLine;
      
}
testEntry( $entry) if ($entry ne "");


sub testEntry
{
  my $entry = shift;
  
  if (! $tnsParser->startrule( $entry ))
  {
    print "Invalid Entry on Lines [ $startLine - $lineCount ]\n" if ! 
+$debug;
    print "$entry\n";    
  }
}
[download]

I have found two problems which I am not sure how to handle.

IPs and Hostnames: The code will match hostnames and IPs but will also match bad IP addresses. It will mistake an IP address like 199.92.100.35.100 as a valid IP. I know the problem is related to how the hostname clause is written but I am not sure how to fix it.
The second problem is tied to the order of the clauses to be matched. If I take the following line:
AddressTCP : Community(?) ProtocolTCP Host Port
It will match the code only if they come in that exact order. There are no rules stating that Community needs to be first and Protocol needs to be second ... The obvious solution is to create create every permutation of this line. My question is "Is there an easier way?" What could be a relatively small program could otherwise become quite unwieldly.

Thanks for any help.

Comment on TNSNAMES.ORA and Recdescent Select or Download Code

Replies are listed 'Best First'.
Re: TNSNAMES.ORA and Recdescent by davido (Cardinal) on Feb 19, 2004 at 21:03 UTC
I'm not intimately familiar with the Parse::RecDescent module, but here is one thing I noticed which could relate to your IP and Hostnames problem: You said that "It will mistake an IP address like 199.92.100.35.100 as a valid IP." The code I'm looking at that you wrote is like this: `IP : /[0-9]+.[0-9]+.[0-9]+.[0-9]/` [download] I have to assume that the part there which looks like an RE is, indeed, a Perl-like regular expression. That being the case, your first problem is that "." has special meaning in an RE: It matches anything except the newline character. So `[0-9]+.` means match one or more numeric digits followed by any character. Well, that's probably not what you really want. So at minimum, escape those dot characters within the RE: `[0-9]+\.` The next thing is that your RE isn't rejecting items that contain MORE than what you're trying to match. The Owls book (O'Reilly: Mastering Regular Expressions, by Friedl) presents the following regexp for matching IP addresses: `/^([01]?\d\d?\|2[0-4]\d\|25[0-5])\.([01]?\d\d?\|2[0-4]\d\|25[0-5])\.([01]? +\d\d?\|2[0-4]\d\|25[0-5])\.([01]?\d\d?\|2[0-4]\d\|25[0-5])$/` [download] Note: I added the $ to the end of the RE so that you're not passing strings that contain anything after the IP address. That may or may not be necessary in your case. Hopefully that will be robust enough to help. Dave	[reply] [d/l] [select]
Re: Re: TNSNAMES.ORA and Recdescent by paulbort (Hermit) on Feb 19, 2004 at 23:12 UTC
That's a cool regex, but IP addresses can be just a little more complicated than that. There's a bad old standard you might need to take into account: Just like in IPv6, you can omit octets in the middle if they are zero, for example: 127.1 means 127.0.0.1 10.40.30 means 10.40.0.30 And it gets worse. 192.168.288 is a perfectly legal IP address, most people would write it as 192.168.1.32. Or 10.258 will get you to 10.0.1.2. The rule seems to be that the last decimal number in the address is first spread out into octets, then the remaining octets are dropped in from left to right, starting leftmost. This is the source of the classic trick of getting around IP and name based web filters by entering the 'one huge decimal number' version of the IP address in a browser (I don't think it works on modern browsers that try to be smart about DNS lookups.) Deranged? Yes. But they're out there, and they work on all NT-based OSs and every *nix I've been able to try it on. If the underlying system (Oracle, I presume) only supports sane IP addresses, you're cool. I don't know if there would be a way to write a regex for these odd cases. Monks? -- Spring: Forces, Coiled Again!	[reply]
Re: Re: Re: TNSNAMES.ORA and Recdescent by iburrell (Chaplain) on Feb 20, 2004 at 06:13 UTC
The question is what standard allows this. Some Unix implementation allows the extra formats in inet_aton. This is visible with Perl and web browsers. Most people would consider the short-form addresses to be errors and only four decimal components to be valid addresses. It makes sense to reject or ignore long form addresses to keep from confusing people or other software. It is possible to restrict the range of numbers with a more detail regular expression: `/^([0-9]\|[1-9][0-9]\|1[0-9][0-9]\|2[0-5][0-9])$/` [download]	[reply] [d/l]
•Re: Re: Re: TNSNAMES.ORA and Recdescent by merlyn (Sage) on Feb 23, 2004 at 18:43 UTC
Also keep in mind that PRD rules are more than just regex. Nothing wrong with something like this: `IP: /([\d.]+)/ { use Socket; inet_aton($1) and $1 }` [download] This takes any digit/dot string and calls `inet_aton` on it, and if it passes, accepts the value. Otherwise, rejects the value. PRD rocks. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: Re: Re: TNSNAMES.ORA and Recdescent by jmr4096 (Acolyte) on Feb 23, 2004 at 19:39 UTC
Thanks for the responses on the REGEX for IP addresses. I had two motivations for writing this script. Using it for work was/is my primary concern BUT I would like it to be as robust as possible so that others could use it. Fortunately at work, we use the standard IP4 with no omitted octets. BUT I am interested in trying to get it to work with IP6 and the shorthand form of the IP address. I plan to immediately implement the regex that was supplied earlier BUT the problem I saw was with my hostname regular expression. Couldn't you have a hostname like 123.empire.com? Is it possible to have 123.224.empire.com? I am guessing that what I will need to do is make sure that last octet contains [a-zA-z].	[reply]
Re: TNSNAMES.ORA and Recdescent by linux454 (Pilgrim) on Feb 20, 2004 at 16:54 UTC
First some general advise. I personally prefer alternations instead of multiple rule definitions, if only for the fact that it makes the grammar look more BNF-ish (which is easy for me to read) Thus: `Address : '(' /address/i '=' AddressTCP ')' Address : '(' /address/i '=' AddressIPC ')' Address : '(' /address/i '=' AddressSPX ')' Address : '(' /address/i '=' AddressPipe ')'` [download] Would become: `Address: '(' /address/i '=' (AddressTCP \| AddressIPC \| AddressSPX \| Ad +dressPipe ) ')'` [download] Or better yet: `Address: '(' /address/i '=' AddressProtocol ')' AddressProtocol: AddressTCP \| AddressIPC \| AddressSPX \| AddressPipe` [download] Also: `TNSEntry : NetServiceName '=' Description TNSEntry : NetServiceName '=' DescriptionList` [download] Would become: `TNSEntry: NetServiceName '=' (Description \| DescriptionList)` [download] Note: In this last example order can matter in some situtations, you'll just have to play with it. Now as far as: The second problem is tied to the order of the clauses to be matched. If I take the following line: AddressTCP : Community(?) ProtocolTCP Host Port It will match the code only if they come in that exact order. There are no rules stating that Community needs to be first and Protocol needs to be second ... The obvious solution is to create create every permutation of this line. My question is "Is there an easier way?" What could be a relatively small program could otherwise become quite unwieldly. I guess I don't understand what you are talking about here. The rule that you have states: AddressTCP by definition is: An optional Community followed by required Protocol, Host, and Port. So yes order does matter here. The rule does require that if a community appears, it must come before the protocol. I don't know enough about the TNSNAMES.ORA file to divine the intended meaning. A clearer description of the exact syntax of this line may help. Also are you parsing each line individually just so you can report which lines contain errors? If so you can accomplish the same thing by letting Parse::RecDescent parse the entire input for you. There's no need to parse the document before you "parse" the document. Anyway that's just my $.02 I could be wrong.	[reply] [d/l] [select]
Re: Re: TNSNAMES.ORA and Recdescent by jmr4096 (Acolyte) on Feb 23, 2004 at 18:36 UTC
Sorry, didn't realize the ambiguity in my question. What I am trying to do is check the syntax of tnsnames.ora entries. This file contains information about how to connect to an oracle database. The code: `AddressTCP : Community(?) ProtocolTCP Host Port` defines the particular order which the entry must follow for it to match. BUT that is not the action I am trying to achieve. The entry could come in any order so I would need to include the following to match every possible order. `AddressTCP : Community(?) ProtocolTCP Host Port AddressTCP : Community(?) ProtocolTCP Port Host AddressTCP : Community(?) Host ProtocolTCP Port AddressTCP : Community(?) Port ProtocolTCP Host AddressTCP : Community(?) Host ProtocolTCP Port AddressTCP : Community(?) Host Port ProtocolTCP AddressTCP : Community(?) Port Host ProtocolTCP .... Would include every combination of each of these four fields.` [download] I picked AddressTCP as an example for the question but most of the fields on the parser could have this problem. All of a sudden this problem becomes unmanagle.Thanks for the help so far. Here is a sample TNSNAMES.ORA entry `PRDSAP01 = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = mercury.empire.com)(PORT = 15 +21)) ) (CONNECT_DATA = (SERVICE_NAME = PRDSAP01) ) )` [download]	[reply] [d/l] [select]