Re: mySQL hits or Storable retrieves?
by perrin (Chancellor) on Oct 02, 2001 at 18:41 UTC
|
Let me re-state your question: "What's a good way to efficiently load a complex but infrequently changing data structure that I want to display in my mod_perl application?"
MySQL is fast enough for most uses like this, but if you have an unusual amount of traffic or a poorly designed schema and your data does not need to be 100% up-to-date, caching is tremendously helpful. The approach you described will work just fine. I would suggest you use Cache::Cache or MLDBM::Sync instead of rolling your own filesystem database though. The biggest problem with this approach is that your data set may become too large to efficiently regenerate all of it every 5 minutes.
I used a load-on-demand approach with caching for the eToys.com product data. This avoids problems stemming from a large data set, but does result in a slower load time for the first user. Here's a description:
- Use Perl objects to represent your data. Give them a load() method that takes a unique ID.
- Inside the load method, check the local cache (I used BerkeleyDB, but you can use one of the modules I suggested above) and retrieve this object if it's there.
- If it isn't in the cache, load it from the database. Put the loaded data in the cache.
You can vary how much your cached data gets out of sync by adjusting the time-to-live for objects in the cache.
Caching is a big subject, and there are lots of tricks you can use (lazy reloading, pre-filling the cache with hot objects, etc.). | [reply] |
|
|
Wow. Thanks for the very helpful response. Part of my problem here is I think I've normalized my data too much (if that's possible), it's split across so many tables that writing the SQL properly is getting beyond me.
Basically, I'm at the point where I need to run the entire universe of records through a set of filters, akin to:
- What documents are in this category/subcategory?
- Is the user public, staff, probono attorney, admin, etc.?
- What organization is this user from, and what docs are specifically excluded from this organization at this access level?
So, it would be so much easier if I didn't have all this conditional crap. I'd probably do it with one SQL join, but I can't here, I need more than one statement (at least in my head) and I think caching the results is the only way
I'm going to get a useable, fast suite of sites.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
CREATE VIEW view_red_cross_docs
AS SELECT <code here>
CREATE VIEW view_pro_bono_docs
AS SELECT <code here>
SELECT <stuff> FROM <joined views>
email: mandog | [reply] [d/l] |
|
|
|
|
Object-level security? It sounds like you want SPOPS, or even the app server built on top of it, OpenInteract. It has hooks to add caching as well.
| [reply] |
Re: mySQL hits or Storable retrieves?
by pjf (Curate) on Oct 02, 2001 at 18:15 UTC
|
It's past midnight, so my mental capacity is going to rapidly approach that of a pumpkin, so take this all with a sprinkle of salt.
The answer to question 4 (are you making it too complicated?), is probably yes. Grab your HTML::Template man-page, and do a search for "file_cache". Here's one I prepared earlier:
file_cache - if set to 1 the module will store its
cache in a file using the Storable module. It uses no
additional memory, and my simplistic testing shows
that it yields a 50% performance advantage. Like
shared_cache, it will work in a CGI environment.
Default is 0.
Now isn't that neat? HTML::Template already knows about the Storable module, and the comments underneath make note about it using flock() to make sure everything behaves. (Regular caveat about file-locking being broken on NFS applies.)
Hope that you find the above useful.
Cheers,
Paul | [reply] [d/l] |
|
|
One other clarification: I'm using HTML::Template's cache features, storing parsed templates in memory. BTW, I think the caching provided by HTML::Template only stores the parsed templates, not the data, into memory. This doesn't help me, I'm talking about storing the data structures.
I wouldn't want HTML::Template to store my data in memory. I'd rather manage that, and when it needs to change, myself.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
| [reply] |
|
|
That stuff in the HTML::Template docs is only referring to the cache used for the compiled templates. That cache is not available through a public interface in HTML::Template. However, there's no shortage of good caching modules on CPAN.
| [reply] |
Re: mySQL hits or Storable retrieves?
by George_Sherston (Vicar) on Oct 02, 2001 at 18:13 UTC
|
Just a thought - I'm working on a MySQL-based CGI which gets lots of hits, and in retrospect I slightly wish I'd worked with PostGreSQL, or at least used DBD instead of DBI, because this would have allowed me to use transactions to manage conflicts between different users at the same time. The alternative I settled on, file locking, feels a bit clunky. this thread has more if you want it.
§ George Sherston | [reply] |
|
|
| [reply] |
|
|
Sorry not to be more clear. I intended to contrast what I understand to be two separate modules, namely DBI and DBD::mysql. At the moment I use the former, which is v straightforward and easy to learn; but (so far as I can make out) unlike DBD::mysql it doesn't support MySQL transactions. If DBI does support MySQL transactions I'd be glad to hear of it! But as you can see from the thread I linked to, my interim conclusion is that it doesn't, whereas DBD:mysql does. My main point is, that if I start out on a project like this in future, I'll make sure I use a DBMS / driver combo that lets me do transactions.
§ George Sherston
| [reply] |
|
|
|
|
Re: mySQL hits or Storable retrieves?
by toma (Vicar) on Oct 02, 2001 at 19:08 UTC
|
We used the approach that you describe and it worked fine.
In our case, we had a fully-normalized schema which
required a six table join to get to useful data.
The queries weren't terribly slow, but we were expecting
lots of them.
If you want to be buzzword compliant, call your
system a 'data warehouse.' Commercial
database companies will sell you this same type of
solution starting at six figure USD prices. I sat
through a long sales pitch. As far as I could
tell, your idea is the only database improvement involved
in data warehousing.
I was able to get the mod_perl processes to share the memory
used by the Storable data structure. This way the many
http daemons didn't consume too much RAM.
We had three data structures using a total of about 100M
bytes of RAM.
You have to set the operating system process size limit
large enough so that you don't run out of memory for the
data structure.
One very minor gotcha we ran into was that the Storable
data was OS dependent. It wasn't a network
order problem; we just upgraded our web server OS and
somehow it wouldn't read Storable data from the old
OS any more. It shouldn't have had that problem, so there
was probably a bug somewhere.
It should work perfectly the first time! - toma | [reply] |
|
|
OH MAN IS THAT SWEET!
I just got the "stored array of arrays" thing working and it's so damn fast! My god!
This is soo cool! Basically, I'm going to pre-create each of my accesslevel/category combinations and then store them on disk with Storable. Then on each invocation, I just need to retrieve the structure, do a little massaging, and VOILA! I've got my complex data structure, created with multiple DBI queries, off disk.
My preliminary testing shows that it's at least 50% faster, and this is just when I'm doing 1 DBI query(the whole reason I needed to do this was because I'm going to need to do more than 1 query) and storing the structure on disk, not in memory.
I'll still use DBI for selecting individual documents and lookups, but for the complicated, multi-query stuff I'm going Storable!
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
| [reply] |
|
|
Re: mySQL hits or Storable retrieves?
by Hero Zzyzzx (Curate) on Oct 02, 2001 at 18:07 UTC
|
Why can't you edit your post sometimes? Strange.
A couple of clarifications- I'm developing under mod_perl and using persistent DBI connections via Apache::DBI.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
Re: mySQL hits or Storable retrieves?
by tstock (Curate) on Oct 02, 2001 at 21:59 UTC
|
Consider, if possible, to actually store the HTML results, and only call the CGI if the html file is not there. I do this on my site where data is only updated once a day on the DB, using the errordocument 404 directive on htaccess to call the CGI.
A cron job deletes all the stored html files when new data is available.
I also store a gzip'ed version of the HTML and serve it if the browser accepts .gz encoding (using mod rewrite), this saves beaucoup du bandwidth... but thats neither here nor there I guess :-)
| [reply] |
|
|
Yeah, I thought about this option initially, but the system I'm creating uses a pretty complex templatting scheme and conditional output of pages. I use cron-created HTML on another system I built and it's pretty sweet, I just regenerate the HTML when there is a change to a document.
This project is a little more intense. I'm making it so that I can have multiple non-profits with multiple domain names, with different HTML templates, all share the same database of docs. Each program also has the ability to exclude documents based on specific properties (document type, subject matter, etc.) and has multiple access levels.
This could quickly get into an absurd amount of static HTML pages, and I only have a T1 (for now. . .) to host it on. As long as I keep my page generation (in bits per second) above
the speed of the T1, I think I'll be OK.
If it turns out down the road that I'm processor bound (I don't think it will) I'll revisit this option again.
You know one thing I do use is HTML::Clean, this module, at a slight cost of speed, about 2% in my testing, and at it's lowest level, gives me a 20-30% savings in total size of my HTML. It's worth it! I may look into the gzip option too, I have a post-processing sub I could easily plug this into.
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |
|
|
Though it's not pretty, you may want to look into "componentizing" your pages using SSI. Since "virtual" includes via SSI can call CGI or whatever else you want, you can cache HTML "chunks" of an overall page while having other portions of the page dynamically-generated.
| [reply] |
Re: mySQL hits or Storable retrieves?
by cyberscribe (Acolyte) on Oct 03, 2001 at 08:05 UTC
|
Hi Hero Zzyzzx,
There are many variables that go into answering your questions, but the one that sprang to my mind is: "does the flavor of SQL you are using support temp tables"? I know MySQL does not, but Informix for example does.
Temp tables are a great way to store data between queries inside SQL itself, which can be an ideal out-of-memory storage solution for many applications. By using temp tables properly, you can split up complicated queries into several simple queries and even perform queries, functions, and sorts that would be impossible in a single SQL statement with a given table layout.
So, that's my advice. Any time I hear about in-memory storage that could incur a hit and cron-tabbed scripts to pick up the slack, I take a look at the given tool set for more elegant methods. In the case of SQL, there are many, and formulating a query for an application can save you a lot of time on the front end with other data structures.
You could say my answer is off-topic to Perl, but in a way I believe it is "perlish" in philosophy:
The elegant and the appropriate, sufficiently understood and applied, have the power to transform the compound and confusing into the concise and beautiful.
-cyberscribe
| [reply] |
|
|
Maybe I'm confused,(in fact I am confused), but maybe I'm specifically confused...
MySQL has got TEMP tables, which are in memory, SQL accessible, databases. I've used these for small data warehouse style problems in the past. They're very fast. Is this what you mean??
--
Anthony Staines
| [reply] |
|
|
The problem is that each "view" if you will, which is a combination of the users' accesslevel and category they are requesting, filtered down by the domain that the request is coming from, requires me to run multiple queries. My understanding was that mySQL had a memory-caching feature for regular tables, in addition to the RAM-only HEAP tables that your alluding to.These views could also get large down the road. (I'm migrating a Lotus Domino (yuck!) NSF that currently has 2,700 documents int it.)
I guess I just thought that it'd be faster and more elegant to run the queries that create the data structures I need periodically, store them to disk with Storable and then retrieve them when I need them, rather then keep two different sets of SQL, one for selecting from a memory-only HEAP table and one for updating to the main database. It's actually far simpler for me to solve this problem using multiple SQL queries, too.
I'll post example code soon. It's suprisingly (at least to me) simple!
-Any sufficiently advanced technology is indistinguishable from doubletalk.
| [reply] |