Zonefile parsing

lechucky has asked for the wisdom of the Perl Monks concerning the following question:

I am currently slowly learning Perl. Previous languages have included fortran, C++, C, and VB, so therefore it isnt coming as that much of a hard move to me. The only problem I am having is taht I cannot for the life of me figure out how to extract a certain piece of information out of a line, depending on whether it passes certain criteria or not. The file is an irregular zone file, which I am trying to parse into a more acceptable format. An example of this file is:

EXAMPLE1                NS      ns0.test.com
                        NS      ns1.test.com
EXAMPLE2                NS      ns0.test.com
                        NS      ns1.test.com
EXAMPLE3                NS      ns0.test.com
                        NS      ns1.test.com
EXAMPLE4                NS      ns0.test.com
                        NS      ns1.test.com
[download]

The only piece of information I need extracted from the file is the domain name, but as the domain does not end in a prefix in the file ( not EXAMPLE1.COM, just EXAMPLE1 ), im finding it harder to extract just the domains, and skip the blank lines with only nameserver information in them. Any help that anyone can give me on this issue would be greatly appreciated. Thank you for your time, and Merry Xmas and a Happy New Year to all.

Adam.

Comment on Zonefile parsing Download Code

Replies are listed 'Best First'.
Re: Zonefile parsing by tachyon (Chancellor) on Dec 21, 2001 at 19:41 UTC
What you need is a regular expression. See perlman:perlre for details. Regular expressions are tuned to do specific things. As such if you want useful help we need real data. If you have some code that fails to work that is also good. You need to specify exactly what you are trying to achieve. Ideally you would post something like A: this is my exact input data B: this is exactly what I want to get out of it C: this is my broken code. D: my code messes up on..... When you post this please use the <code> </code> tags to wrap the data and code in to retain the formatting. These tags are like supercharged pre tags where the HTML special chars are automatically escaped. Update Looking at your raw data you can do this in one line: `/^(\S+)/ && print "$1\n" while <DATA>; __DATA__ EXAMPLE1 NS ns0.test.com NS ns1.test.com EXAMPLE2 NS ns0.test.com NS ns1.test.com EXAMPLE3 NS ns0.test.com NS ns1.test.com EXAMPLE4 NS ns0.test.com NS ns1.test.com # this prints EXAMPLE1 EXAMPLE2 EXAMPLE3 EXAMPLE4` [download] You may find this easier to understand: `while (my $line = <DATA>) { if ( $line =~ /^(\S+)/ ) { print "$1\n"; } }` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: Zonefile parsing by zenmaster (Beadle) on Dec 21, 2001 at 20:00 UTC
Assumming that your file is read through STDIN this script : `while (<>) { print "$1\n" if /\s(\w+)\s+NS\s+./ #print what found in parens if the (current) line # begins whith zero or more space/tab... one or # more letter (we save it by enclosing it in parens) # at least one space then the 2 letters NS followed # by at least a space anf maybe other char }` [download] will do the job. But BEWARE I've made (far too many) assumptions here.... NS is always uppercase (hint: see the i modifier otherwise) The domain name is always followed by NS (it seems to be the case in your example...) ... Hope this helps in your regex learning... You should may be edit your node and add some <BR> to make it more readable...	[reply] [d/l]

Update