npai has asked for the wisdom of the Perl Monks concerning the following question:

I would greatly appreciate some help with DBM deep complex hashes.

I have one dbm deep object errorsdb that has a hash which looks like

Errorsdb -> Branch Name ->Errortype-> count(scalar) Errorsdb -> Branch Name ->Errortype-> url(array of urls)

This basically parses through the entire website and documents the errors. I have about 30 Branchnames, 15 errortypes and url arrays could go upto 1000.

This is essentially a very huge dbm deep datafile. By the time it has finished parsing about 3 Branch Names, the process almost stops.

Can anyone throw any light on the situation and suggest some changes if needed to make the process run through completely without breaking??

I moved the in memory hash to DBM Deep object for the same problem that I could not run more than 4 branches at a time due to memory becoming full.

Thanks in advance. Namitha

Replies are listed 'Best First'.
Re: DBM Deep Hash of Hash of Hash of array
by dragonchild (Archbishop) on Apr 16, 2008 at 16:28 UTC
    I would have to see how you are using DBM::Deep in order to give you some direction as to how to improve things. I see you've fixed or bypassed your FileHandle::Fmode problems; that's good.

    Without seeing anything, I would suspect that you're doing a lot of iterating over all the keys of some hash. That's documented at DBM::Deep as being slow at the moment.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      Hi,

      Good to see that someone remembers that I have been through some other problem :-)

      I have bypassed Fmode by removing the reference. So far that seems to be ok. because the file is not being used by any other program. The other problem that I had earlier was that I was not able to access the DBM deep object in a method to push data into or read from it.

      By chance I found out the weirdest way to fix it. My method/function has several input variables. If my DBM deep object is the last variable to be passed, it fails to recognize it, but the moment I made it the first variable in the list of 7 input variables, I could access it just fine.

      As for the code

      # Making an entry in the errordb push (@{$errordb->{$_BranchName}->{MissingTitle}->{url}}, $_URL); #Increasing the counter $errordb->{$_BranchName}->{MissingTitle}->{count} +=1;

      This adds the URL into the hash.

      Later on I access this is another method to output the values on a html page

      foreach $testurl (@{$errordbref ->{$_BranchName}->{$ThingsToPrint}->{ +url}}) print $testurl;

      The code is fairly straightforward. I think the problem is in the size of the object

      A database is not available and hence all these attempts. Basically the program is a spider for my website (over 40000 pages) to find errors and display that on a html page.

        Try the following to replace your foreach:
        my $size = $#{ $errordbref ->{$_BranchName}->{$ThingsToPrint}->{url} } +; foreach my $idx ( 0 .. $size ) { my $testurl = $errordbref ->{$_BranchName}->{$ThingsToPrint}->{url +}->[$idx]; print $testurl; }
        That may reduce a lot of the RAM and disk usage you're seeing.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        A database is not available and hence all these attempts. Basically the program is a spider for my website (over 40000 pages) to find errors and display that on a html page.
        Databases are easy to set up and don't require any sort of administrator privileges. They are also built to handle large data sets. mysql, for instance, has no problem handling data sets with millions of records. Moreover, using a relational database makes your persistent data much more transparent, and I think you'll find it'll be easier to debug, maintain and extend your code because of that. Anyway, just something to consider...
Re: DBM Deep Hash of Hash of Hash of array
by pc88mxer (Vicar) on Apr 16, 2008 at 16:21 UTC
    When you say that the process "almost stops", do you mean that it slows down? At that point what does ps say about its size, cpu time, page faults, etc.?

    My guess is that you are running into some limitation of DBM. Instead of using DBM I would suggest that you use a relational database. I think the schema would be very simple, and then you could more easily slice n' dice your data depending on what kind of reports you needed. With DBM you are choosing a preferred storage format which may not work so well for collecting other kinds of statistics. Also, putting your data into a relational database could allow you to hand-off the analyzing of the data to someone else. The data becomes more useful in the sense that more people can use it.

      DBM::Deep is not DBM. Please read up on the differences between the two.

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?