Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Re: iteration through tied hash eat my memory

by ph0enix (Friar)
on Dec 09, 2002 at 15:42 UTC ( [id://218551]=note: print w/replies, xml ) Need Help??


in reply to Re: iteration through tied hash eat my memory
in thread iteration through tied hash eat my memory

Originally I started with BerkeleyDB but switch to the PostgreSQL because filesize limitation (can't create table larger than 2GB) and concurrent acces requirement.

I want to continue using PostgreSQL and I also need tied solution with nested data structures support. So I slightly modify Tie::RDBM for binary data storage support...

Is there better way how to use PostgreSQL with tie interface? I prefer to use existing module instead of writing the new one.

  • Comment on Re: Re: iteration through tied hash eat my memory

Replies are listed 'Best First'.
Re^3: iteration through tied hash eat my memory
by diotalevi (Canon) on Dec 09, 2002 at 15:50 UTC

    Excuse me!? BerkeleyDB does databases up to 4 terabytes, transactions, concurrant access and cursors. Perhaps your operating system can't handle files that large but BerkeleyDB is perfectly fine with them. PostgreSQL avoids that OS limit by splitting database files at one gig. Which operating system are you using?

    If you want to continue to use PostgreSQL then you'll need to start accessing it smarter (cursors/asynchronous) or just waste minimal memory. In general a tied PostgreSQL interface really isn't the right solution for this (again, unless you do cursors or asynchronous). Really, do this The Right Way.


    Update: It also occurs to me that if your dataset that large it's already mandatory that you do things the smart way. Using cute gimicks is nice but you need to be intentional in how you approach your disk access and memory usage. You really can't afford not to.

    Update again: If you haven't already then you really need to read the BerkeleyDB document from SleepyCat. The Pod documentation in the CPAN module is really just a gloss on how to translate BerkeleyDB idioms into perlcode. You have to read the actual documentation to get at the right code. For instance - you only get concurrant access if you initialize that subsystem. The Pod documentation barely mentions it - it's fully covered in the library docs. So go read that. It's online at http://www.sleepycat.com/docs/index.html. You probably want to read the bit on the C API since that's where the CPAN module links in.

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

      I did more checking. This time on doing cursors in perl. The key here is to go to the PostgreSQL web site and search the archives for 'cursors perl'. That yields the example (from http://archives.postgresql.org/pgsql-general/2001-01/msg01569.php:

      use Pg; my $conn = Pg::connectdb("dbname = test"); my $result = $conn->exec("begin work"); $result = $conn->exec("declare c1 cursor for select foo from bar"); $result = $conn->exec("fetch forward 1 in c1;"); print "Hurray, I fetched a row: ", $result->fetchrow, $/; $result = $conn->exec("end work;");
      __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

      I'm using SuSE Linux 7.3 (Intel 32-bit) which contains Berkeley DB database library in version 3.1.17. When file size with database raised to 2GB I get message like 'File size limit exceeded (SIGXFS2)' (not exactly - translated from localized message). I'm able to create larger files on filesystem (tested up to 6GB). Does it mean that db package in my distribution was miscompiled?

        I did a bit of checking for you and it looks like Linux support for large files was added in version 3.2.9 (see the Changelog at http://www.sleepycat.com/update/3.2.9/if.3.2.9.html). Your signal was from XFS so some other checking brought up the link http://oss.sgi.com/projects/xfs/faq.html#largefilesupport which indicates that your large file support may be conditional on your glibc library. My recommendation is to get the current version of BerkeleyDB and install it into /usr/local. Be very careful not to disturb your existing library since various parts of your OS probably depend on 3.1.17 staying 3.1.17.

        Google is your friend suse xfs 2gb. Obviously just read the changelog on http://www.sleepycat.com's web site for the scoop on BerkeleyDB.

        __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
Re: Re: Re: iteration through tied hash eat my memory
by iburrell (Chaplain) on Dec 09, 2002 at 20:05 UTC
    You would need to modify the modules to be more efficient with memory usage with huge tables. You could either modify Tie::RDBM to use the Pg module for direct PostgreSQL access with cursors. Or modify DBD::Pg to use cursors. Otherwise, the SELECT statement to fetch the keys stores all the data on the client. You might want to investiage MySQL and see if DBD::Mysql uses cursors. It also looks like you will have to modify Tie::RDBM to not cache the values that it fetches.

    The important question to ask is whether you really need a tied hash. If you don't need one, then you can do operations more efficiently with custom database accesses. Since PostgreSQL doesn't use cursors, you will have to avoid any queries that return all the keys.

    For example, counting the number of keys is doable with a SELECT COUNT(*) FROM table statement. You might want to normalize the database to store each value in a row of a second table. Then you won't need to use Perl serialization to store the data structures. You can the total number of values with SELECT COUNT(*) FROM values. And individual counts for each key: SELECT COUNT(*) FROM values WHERE key = $key.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://218551]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-19 12:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found