•Re: Sharing data structures in mod_perl
by merlyn (Sage) on Mar 28, 2002 at 16:07 UTC
|
The anecdotes I've heard are that Storable gets more and more expensive as you have to share bulkier data structures across. And that shared memory really bites, while disk storage is much preferred.
If your structures are complex enough, I've heard it was actually better to use a real database to share things! That is, for some amazingly small number of elements in a hash, it was faster to do a data query to PostgreSQL to fetch the interesting parts through a DBI query than to simply use Storable, write it to disk, read it from disk, and then look at it in Perl. Interesting scaling point. That stuck in my brain because it was counterintuitive at the time, both to the observer, and to me.
-- Randal L. Schwartz, Perl hacker | [reply] |
|
|
That's why I went to this whole serializing system with Storable in the first place- I figured that saving DBI queries would always be a good thing, that is apparently NOT the case when you get large with your structures. I have a database (MySQL) that acts as my data repository- I figured I'd use data structures serialized on-disk to save time, avoiding DBI hits. This is all working, just not at the performance level I expected.
It'd be interesting to find out where the break even point is- How much does serializing performance degrade as your structures get larger and larger? Does it degrade linearally or exponentially?
I'm going to move back to DBI queries, or perhaps to a system that splits up my serialized structures into MUCH smaller chunks.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
| [reply] |
|
|
I've got Apache::DBI working, and I suppose I need to investigate prepare_cached() further. If you're familiar with prepare_cached, do you know if there are easy ways to deal with expiring stale statements, or do you have to roll-your-own staleness checker? A quick perusal of the docs didn't turn up a whole lot. . .
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
Re: Sharing data structures in mod_perl
by perrin (Chancellor) on Mar 28, 2002 at 17:03 UTC
|
Sharing perl data between processes without serialization is not possible. Even putting a simple scalar into shared memory involvs serialization, since the scalar has to be converted from a perl data structure to a simple string of bytes and back again.
What makes you so sure that serialization is the problem? Storable is very fast. It sounds to me like the problem is that you are serializing the entire data structure every time, instead of just the one tiny chunk of it that you need to look at.
Your problems with the staleness check sounds like some kind of bug in your code. There is no problem with globals in mod_perl, or references, or file mod times. I couldn't say more without seeing the code. Anyway, as I said, storing and loading the entire hash is not an efficient way to do this.
I'm presenting a paper on the most efficient data sharing modules at the Perl Conference this year, but I'm not done with my benchmarking yet so I can't tell you the winner. I do recommend that you try MLDBM::Sync or Cache::Cache. Both of them give you a hash-like interface (MLDBM::Sync is a tied hash module, while Cache::Cache provides get/set methods for key/value pairs), and each element of the hash can contain arbitrary data structures. Don't stuff all your data into one element of the hash, or you'll defeat the purpose. The idea is to only de-serialize the small piece of data your program needs at any given moment.
Hope that helps. When I have more information about which data sharing modules are fastest, I'll post some data about it on perlmonks.
| [reply] |
|
|
With the staleness check above, I was trying to NOT serialize the data, therefore storing a reference in a global shared among httpd processes. This wasn't working, obviously, and for good reason.
I agree that serializing is fast, except when you are trying to serialize too large a structure (we're talking foolishly large, here. My bad. Basically, I was serializing/deserializing a structure that was MUCH larger than it needed to be. I wouldn't think to do this with a DBI query, don't know why I was thinking it'd be OK when I did it with Storable.) I'm probably going to move toward splitting my structure into MUCH smaller chunks and still serializing it, or back to straight DBI. I haven't decided., this is going to depend on the results of some benchmarking I need to whip up, and the opportunity cost of switching my code.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
With the staleness check above, I was trying to NOT serialize the data, therefore storing a reference in a global shared among httpd processes.
Yeah, you can't do that. Globals are not shared between processes. That's not a mod_perl thing; it's just how processes work.
One more tip: others have seen great results from Cache::Mmap. You may want to look at that.
| [reply] |
|
|
| [reply] |
Re: Sharing data structures in mod_perl
by VSarkiss (Monsignor) on Mar 28, 2002 at 16:21 UTC
|
Hmm, merlyn's node above made me think of this: what about using Matts' DBD::SQLite in-memory database driver? It's still pre-1.0, but if it works for you, it could be a big win. You could structure your data to reduce memory hits, it supports transactions so you can control access, and it appears to have decent performance under load. There is a benchmark at SQLite vs CDB_File vs BerkeleyDB. Might be worth a try.
| [reply] |
|
|
SQLite is really not for multi-process use. It doesn't support much in the way of locking, not even to the point of the original MySQL stuff and certainly not as good as the new MySQL table types or PostgreSQL.
| [reply] |
|
|
I have checked it out a little. Pretty cool idea, I'll probably use it on a smaller project somewhere. I guess my thoughts are that if I'm going back to DBI queries anyway, I might as well run them against my MySQL server that's already storing the data.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
Re: Sharing data structures in mod_perl
by Anonymous Monk on Mar 28, 2002 at 20:20 UTC
|
Are there any tutorials or hints about serializing data with Perl? It seems much more widespread with the current trend of app servers in the java land, was wondering how Perl developers have been dealing with these issues... | [reply] |
|
|
This is easier than it might appear- read the Storable and Data::Dumper docs. A number of cachy-type modules use Storable for data serialization. If you haven't used Data::Dumper yet, it's cool as heck, and an EXCELLENT debugging tool.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
Re: Sharing data structures in mod_perl
by mattr (Curate) on Mar 30, 2002 at 06:14 UTC
|
I'd imagine Storable is faster but here's something fun to look at.
You might like to check Lincoln Stein's page on Boulder.
"Boulder IO is a simple TAG=VALUE data format designed for sharing data between programs connected via a pipe. It is also simple enough to use as a common data exchange format between databases, Web pages, and other data representations."
I have not benchmarked the serialization, but his Wormbase handles 2 gigabytes of genetic data and seems pretty quick.
With Boulder IO you can make nested data structures, and you can just pull the tags you want out of a boulder stream letting other tags pass through to other programs in the pipeline. A boulder is made of a stone which can hold smaller stones; stones are stored in a berkeley dbm. Don't know how tough it would be to roll support for another db. From the docs it would seem that with the tags() method you can glean some information from the ascii tag of a deeply nested stone without unpacking its data.
Thought this might be a conceivable way to share data.. but do note that a Fcntl lock keeps users from reading while a write is going on.
Anybody else with experience using boulder io, would love to hear it. | [reply] |
Re: Sharing data structures in mod_perl
by Ryszard (Priest) on Mar 31, 2002 at 00:01 UTC
|
I've successfully used Postgres and storable to share data. Altho' i cant quote any perf figures, it works quite well, and is *very* easy to do. | [reply] |