Re: DBM::Deep overhead
by davido (Cardinal) on Apr 20, 2011 at 15:05 UTC
|
DBM::Deep is approximately creating a full-service database; one that you could query, edit individual entries, and do just about anything you could do with a database while using the data-structure motif. That carries with it overhead in both file size and time. Storable is creating a freeze-dried data dump that can be reconstituted quickly and easily. But when it's in storage, it's not useful. So there is no need for any sort of elaborate framework to make the data accessible as it would be in a database.
In your test of approximately 37K of data, Storable is the hands-down winner. If your objective is to just freeze your data in time, and thaw it later, Storable has performance advantages. If you prefer to interact with the stored data, the database route could become higher performance, as it would allow for reads and edits of individual elements without rewriting the entire datastructure each time. It's all in how you plan to use it.
By the way: sorting your hash, as in my %hash_sorted=sort %hash; (line two of your posted snippet) is not helpful. First, hashes have no implicit order, so sorting them is useless. Second, providing a hash to a list function such as sort will send a flat list of key, value, key, value, key, value to sort. The sorted output will be a flat list which may well be key, key, value, key, value, value (in other words, your keys and values will get all jumbled up). Then that sorted list gets sent back into a hash. Any values that happened to suddenly become keys will now need to be unique. Any non-unique values that became keys will result in some of the values simply being dropped. Sorting a hash will make a big mess of it.
| [reply] [d/l] |
|
|
| [reply] |
Re: DBM::Deep overhead
by locked_user sundialsvc4 (Abbot) on Apr 20, 2011 at 14:35 UTC
|
Which approach is more appropriate for what you want to do? Purpose-wise, I submit that these two packages are apples and oranges.
Storable is a package that is designed for use as a persistence-framework. You take a moderate amount of data in RAM and “freeze” it into a representation that you can, say, stuff into an HTTP session-store.
DBM::Deep, on the other hand, describes itself as “a unique flat-file database module, written in pure Perl.” DBM::Deep::Cookbook, in the Performance section, makes the following additional caution:
Because DBM::Deep is a concurrent datastore, every change is flushed to disk immediately and every read goes to disk. This means that DBM::Deep functions at the speed of disk (generally 10-20ms) vs. the speed of RAM (generally 50-70ns), or at least 150-200x slower than the comparable in-memory datastructure in Perl.
Okay, nothing at all wrong with that. “This is what you get, and this is the price you will pay to get it.”
Consider all of your various options, even within DBM::Deep. For example, SQLite is a rock-solid single file on-disk data store that is used by everything on the planet including the cell-phone in your pocket. (It, too, normally commits all writes to disk immediately, so you must use Transactions effectively to get decent performance out of it.)
| |
|
|
Thanks for the distinction between Storable and DBM::Deep. I guess I was thinking more in terms of what Storable does than an actual database, or specifically one that is accessed by multiple programs/users concurrently. I was looking at it as an easy way of storing complex data so it could be retrieved later, one piece at a time.
My observation on overhead wasn't just in terms of cpu but also the significant difference in file size. Will increasing the input data size from 36 K to 360 K result in a database size increase from 4 megs to 40 megs? But I assume there is an initial overhead that won't be changed much by an increase in number of records or an increase in record size.
Thanks for the information on perl databases. It's something I am going to have to dig deeper into.
| [reply] |
Re: DBM::Deep overhead
by CountZero (Bishop) on Apr 20, 2011 at 16:18 UTC
|
If you want to learn about handling persistent data in Perl, a much better way is to invest some time in reading about the DBI framework. That is more or less the standard way of dealing with all kinds of databases in Perl and for sure will come in handy later. Once you see how easy Perl can handle all kinds of databases, you will discover more and more ways to apply that knowledge.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
|
There's a big difference between DBI and DBM::Deep though. DBI is for talking to a relational database. DBM::Deep is for storing and accessing a perl data-structure on disk. Both have situations where they are clear winners.
The difference between DBM::Deep and Storable is a bit more subtle. Both are for storing and accessing perl data-structures on disk. DBM::Deep gives you random access to that structure without having to pull it all into memory, at the expense of being quite slow for small data sets. For small structures of only 4000-ish elements like what the OP has, the overhead of DBM::Deep appears to be very large. But when you have millions of elements, you'll find that DBM::Deep is faster.
To read, change, and write an element in a Storable file, you need to read the entire file, update that one element, and write the entire file. Reading and writing tens of megabytes is slow. To read, change, and write an element in a DBM::Deep file, the size of the file is irrelevant, you just need to do a handful of small reads and seeks to find the right place, then read, and write just a few bytes. To a good approximation, you will need to read ten bytes for each level of nesting from the root of the data structure to the point which you want to edit, and have one seek per level.
| [reply] |
|
|
| [reply] |
Re: DBM::Deep overhead
by anonymized user 468275 (Curate) on Apr 21, 2011 at 13:46 UTC
|
Most modern DBMS's are built for size rather than speed. DBM::Deep is intended simply to bring a Perl-only implentation of a DBMS into play. Storables are going to slow down faster than DBM::Deep as table sizes increase.
However, one approach might be to implement a module that transparently uses multiple storables per table and has a hashing algorithm to select where to store values based on a primary key concept as well as a virtual memory architecture (simply an array of storable filenames of limited indivudial size and unlimited number per "table" to keep track of which internal references are being allowed to be kept alive by the Storable package and a policy of forcibly freezing references to minimise the number of storables being allowed to use memory - it would need to pick a victim to drop from memory everytime it needed to thaw something not currently active. Such a pseudodbms module should of course keep its interaction with Storable under the hood).
| [reply] |