Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Our named.conf is well over 20 meg, so parsing this is quite fun. Can anyone suggest a better method than the one below, which seems to take forever to match towards the end of the file. The only prerequisites are it has to be bulletproof as named.conf prefers to be error-free (!)
Dev code folows:
local $/=undef; open(F, "named.conf") || die $!; my $zonedata=<F>; close F; if ($zonedata=~/(zone\s+"$domain"\s+\{\r?\n(.*?)\r?\n(.*?)\r?\n(\s+)?\ +};(\r?\n)+)/) { print "Got a match on $1\n"; $zonedata=~s/$1//; #write out.. }

My sole reason for loading into scalar is for simplicity, although I think this is where the problem may be. I avoided while <> to avoid keeping the file open for longer than was necessary.
FYI a typical entry in named.conf would be:

zone "foobar.com" { type master; file "named.foobar.com"; };

Thanks-in-advance Monks

Replies are listed 'Best First'.
Re: regex large named.conf parsing
by dragonchild (Archbishop) on Jul 21, 2003 at 13:40 UTC
    Forgive me if I'm wrong, but it sounds like you're attempting to manage your named.conf. Here's a thought - instead of managing it within itself, instead, manage it from a database and have a script that will generate it from the database. Think of it as source and a makefile to create your executable - you won't ever touch the actual named.conf.

    As to your problem - there's an easy way to find out where the bottleneck is: add timing statements.

    print "Time: ", join(':', map { sprintf("%02d", $_) } (localtime(time) +)[2,1,0]), $/;
    Put that line before and after every area you want to time and you'll get a rough idea, in seconds, of what's happening. It won't help you really with real optimization, but it sounds like you just want to know if it's the first half or the second half that's causing your issues.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: regex large named.conf parsing
by Skeeve (Parson) on Jul 21, 2003 at 13:36 UTC
    If you don't want to keep the original file open, why don't you simply copy it and read line by line with while (<>)? I think 20Meg on Harddisk is less expensive than 20Meg of Memory.

    However. I think you can shorten your re:

    You have: /(zone\s+"$domain"\s+\{\r?\n(.*?)\r?\n(.*?)\r?\n(\s+)?\};(\r +?\n)+)/ Won't this do: /(zone\s+"$domain"\s+\{[^}]+\};[\r\n]+)/o
    Since you don't need all the elements found in $2, $3..., why store them?

    Should you need them, you can extract them later, when you found a match for the domain.

    You should also ad o to your regexp in order to insert $domain's content just once into your re.

Re: regex large named.conf parsing
by dws (Chancellor) on Jul 21, 2003 at 21:13 UTC
    Can anyone suggest a better method than the one below, which seems to take forever to match towards the end of the file.

    See Matching in huge files for a sliding buffer technique that lets you run a regex that might match multiple lines on a huge file, without needing to pull the entire file into memory. What the technique doesn't support, at least not directly, is doing substitutions. But you might be able to adapt it to your purposes.

Re: regex large named.conf parsing
by TVSET (Chaplain) on Jul 21, 2003 at 22:01 UTC