Tricky parsing.

lostriver has asked for the wisdom of the Perl Monks concerning the following question:

The wise one, I need to parse many of these:

("TELNET:OVERFLOW:OPTIONS-REPLY"
        :info (
                :english (
                        :name ("TELNET:OVERFLOW:OPTIONS-REPLY")
                        :long_name ("TELNET Options Overflow (Response
+)")
                        :description ("This signature detects attempts
+ to
exploit a known vulnerability against the BSD-based TELNET daemon. The
option processing function (telrcv) in the daemon produces responses w
+ith
a fixed size buffer, but does not perform bounds checking. Attackers c
+an
send a combination of TELNET protocol options to the daemon to overflo
+w
the buffer and execute arbitrary commands.")
                )
        )
        :color (red)
        :severity (5)
        :category (TELNET)
        :keywords ("telnet dangerous command bin")
)
[download]

Regexes will not cut it... Will Parse::RecDescent help? Any other way? Thanks.

Comment on Tricky parsing. Download Code

Replies are listed 'Best First'.
Re: Tricky parsing. by ikegami (Patriarch) on Mar 17, 2006 at 00:15 UTC
Quite easily my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; sub dequote { local $_ = @_ ? $_[0] : $_; s/^"//; s/"$//; s/\\(.)/$1/sg; return $_; } } parse : '(' item ')' /\z/ { $item[2] } item : value child(s?) { [ $item[1], $item[2] ] } \| child(s?) { [ undef, $item[1] ] } child : CHILD_NAME '(' item ')' { [ @item[1,3] ] } value : NUMBER \| IDENT \| QSTRING NUMBER : /[1-9][0-9]/ IDENT : /[a-zA-Z][a-zA-Z0-9_]/ QSTRING : /"(?:[^"\\]\|\\.)"/s { dequote($item[1]) } CHILD_NAME : /:[a-zA-Z][a-zA-Z0-9_]/ { substr($item[1], 1) } __END_OF_GRAMMAR__ [download] Untested. Update: I just noticed the "value" is optional. Fixed. Update: Tested. Found a few bugs. Fixed them. `use Data::Dumper; use Parse::RecDescent (); my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); my $parse_tree = $parser->parse($text) or die("Bad text\n"); print Dumper $parse_tree;` [download]	[reply] [d/l] [select]
Re: Tricky parsing. by saintmike (Vicar) on Mar 17, 2006 at 00:18 UTC
Text::Balanced will take care of the nested parentheses. Update: Here's a script to parse your data into a hash of hashes. To get access to `long_name`, for example, just use `$href->{info}->{english}->{long_name}`: use Text::Balanced qw(extract_bracketed); my $rex = qr/(?s):(\w+) ($.*)/; my $href = {}; my $data = join '', <DATA>; extract($data, $href); print $href->{info}->{english}->{long_name}, "\n"; ########################################### sub extract { ########################################### my($input, $href) = @_; while($input =~ /$rex/) { my($tok, $str) = ($1, $2); my ($extr, $rest) = extract_bracketed($str, '()'); if($extr =~ /$rex/) { $href->{$tok} = {}; extract($extr, $href->{$tok}); } else { $extr =~ s/^\(\|$$//g; $href->{$tok} = $extr; } $input = $rest; }; return $href; } __DATA__ ("TELNET:OVERFLOW:OPTIONS-REPLY" :info ( :english ( :name ("TELNET:OVERFLOW:OPTIONS-REPLY") :long_name ("TELNET Options Overflow (Response)") :description ("This signature detects attempts to exploit a known vulnerability against the BSD-based TELNET daemon. The option processing function (telrcv) in the daemon produces responses with a fixed size buffer, but does not perform bounds checking. Attackers can send a combination of TELNET protocol options to the daemon to overflow the buffer and execute arbitrary commands.") ) ) :color (red) :severity (5) :category (TELNET) :keywords ("telnet dangerous command bin") ) [download]	[reply] [d/l] [select]
Re: Tricky parsing. by zer (Deacon) on Mar 17, 2006 at 00:48 UTC
Parse::RecDescent is also a good source. The book Advanced Perl goes in depth on this type of parsing	[reply]
Re: Tricky parsing. by zer (Deacon) on Mar 16, 2006 at 23:58 UTC
how does this need to be parsed?	[reply]
Re^2: Tricky parsing. by lostriver (Initiate) on Mar 17, 2006 at 00:06 UTC
Extract certain fields (this is just a snipet, there is a lot more tags & depth). Say, 'description' => value...	[reply]