pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing a config file (dhcpd.conf to be precise), and I'm not sure of a good way to parse it. Basically, it can have lines where there is no reliable line separator. Like this:

option foo bar, baz; option foo bar, baz; subnet 192.68.0.0 netmask 255.255.00 { option foo blat, boff; } subnet 192.68.0.0 netmask 255.255.00 { option foo blat,boff; }

I need to know that the global option 'foo' equals 'bar, baz' and the option 'foo' associated with subnet 168.156.0.0 is 'blat, boff'.

It looks like I can separate my problem into key-value pairs ala option foo bar; and containers ala subnet ... {  ... }.

I figure I'm going to have to slurp the whole file into a scalar and then walk through it with some regex magic, but I don't know the regex kung-fu.

Any suggestions?

Thanks -Pileofrogs

Replies are listed 'Best First'.
Re: Parsing Config File with Multi-line Elements
by saintmike (Vicar) on Apr 10, 2006 at 23:33 UTC
Re: Parsing Config File with Multi-line Elements
by ikegami (Patriarch) on Apr 11, 2006 at 04:12 UTC

    I actually worked with dhcp.conf back in '98 or so. We were converting extensive BOOTP tables to a dhcp.conf. I don't remember the spec any, though. ok, enough reminescing. :)

    You could use Parse::RecDescent. A start would be:

    my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : item(s?) /\Z/ { $item[1] } item : subnet | option subnet : SUBNET IP_ADDR NETMASK IP_ADDR subnet_blk { [ @item[0,2,4,5] ] } subnet_blk : option(s?) option : OPTION option_list { [ $item[0], @{$item[2]} ] } option_list : ... # Reserved Words # ============== #SUBNET : IDENT { $item[1] eq 'subnet' ? $item[1] : undef } #NETMASK : IDENT { $item[1] eq 'netmask' ? $item[1] : undef } #OPTION : IDENT { $item[1] eq 'option' ? $item[1] : undef } SUBNET : /subnet(?![a-zA-Z-9_])/ NETMASK : /netmask(?![a-zA-Z-9_])/ OPTION : /option(?![a-zA-Z-9_])/ # Tokens # ====== IDENT : /[a-zA-Z][a-zA-Z-9_]*/ IP_ADDR : /\d+\.\d+\.\d+\.\d+/ __END_OF_GRAMMAR__

    Update: Optimized reserved words.