spstansbury has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!

I have a script that processes the output of a vulnerability scanner. When I have a CVE identifier, I look up the CVSS base metrics from the NVD files (nvd.nist.gov).

Everything works just fine, but as I process thousands of records, I run out of memory, top shows an ever incrementing VSIZE...

The issue is in this block of code, as I can set the $cve_id to skip this else clause and the script trundles along happily.

I know that I am declaring a new "my parser=", etc. everytime, how can I make sure the data structure are deleted/torn down?</p

else { # Take the CVE indentifier and get the CVSS vectors # parse the CVE identifier to determine what data file to search my @record_fields = split( /:/, $cve_id ); $cve_id = $record_fields[1]; $cve_id =~ s/^\s+//; $cve_id =~ s/\s+$//; my @id_fields = split( /-/, $cve_id ); my $year = $id_fields[1]; if ($year < 2003) { $data_file = "$nvd_files/nvdcve-2.0-2002.xml"; } else { $data_file = "$nvd_files/nvdcve-2.0-" . $year . ".xml"; } # Parse the data file: my $cve_parser = XML::LibXML->new(); my $cve_doc = $cve_parser->parse_file( $data_file ); my $cve_xc = XML::LibXML::XPathContext->new( $cve_doc->documentEle +ment() ); # Register the namespaces: $cve_xc->registerNs( def => 'http://scap.nist.gov/schema/feed/vuln +erability/2.0' ); $cve_xc->registerNs( vuln => 'http://scap.nist.gov/schema/vulnerab +ility/0.4' ); $cve_xc->registerNs( cvss => 'http://scap.nist.gov/schema/cvss-v2/ +0.2' ); # Find the appropriate CVE entry in the data source: for my $entry ($cve_xc->findnodes("/def:nvd/def:entry[\@id = '$cve +_id']")) { if (my ($metrics) = $cve_xc->findnodes('vuln:cvss/cvss:base_me +trics', $entry)) { $av = $cve_xc->find('cvss:access-vector', $metrics); $ac = $cve_xc->find('cvss:access-complexity', $metrics); $au = $cve_xc->find('cvss:authentication', $metrics); $ci = $cve_xc->find('cvss:confidentiality-impact', $metric +s); $ii = $cve_xc->find('cvss:integrity-impact', $metrics); $ai = $cve_xc->find('cvss:availability-impact', $metrics); } else { $av = ""; $ac = ""; $au = ""; $ci = ""; $ii = ""; $ai = ""; } } }

As always, thanks for any and all help!

Scott

Replies are listed 'Best First'.
Re: XML::LibXML memory leak
by ikegami (Patriarch) on Dec 07, 2010 at 22:22 UTC
    The tree can't be freed while you're still using it with $ac, $au, $ci, $ii and $ai.
      It probably could if you detach $ac/$au/$ci/$ii/$ai, perhaps with
      sub XML::LibXML::Node::detach { my( $self ) = @); $self->parent->removeChild( $self ); }

        There's no reason to go messing with someone else's namespace.

        sub detach { my( $node ) = @_; $node->parentNode->removeChild( $node ); } detach($node);

        would work just as well.

        Except it's not enough. It won't separate it from the document.

        $ perl -MXML::LibXML -E' my $node; { my $xml = "<root><foo><bar/></foo></root>"; my $doc = XML::LibXML->new->parse_string($xml); ($node) = $doc->findnodes("//bar"); $node->parentNode->removeChild($node); } { my $doc = $node->ownerDocument; say "owner=", $doc; if ($doc) { say $_->nodeName for $doc->findnodes("//*"); } } ' owner=XML::LibXML::Document=SCALAR(0x817bcb8) root foo

        You need to give the node a new document.

        $ perl -MXML::LibXML -E' my $foster_home = XML::LibXML::Document->new("1.0", "UTF-8"); my $node; { my $xml = "<root><foo><bar/></foo></root>"; my $doc = XML::LibXML->new->parse_string($xml); ($node) = $doc->findnodes("//bar"); $node->setOwnerDocument($foster_home); } { my $doc = $node->ownerDocument; say "owner=", $doc; if ($doc) { say $_->nodeName for $doc->findnodes("//*"); } } ' owner=XML::LibXML::Document=SCALAR(0x832af38)

        Note that transfers the node's children too.

Re: XML::LibXML memory leak
by Jenda (Abbot) on Dec 12, 2010 at 12:02 UTC

    Use XML::Twig or XML::Rules and process the file in chunks. By the time your loop runs it'a already too late and the memory has already been wasted.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.