Improving efficiency - repeated script hits at small interval

BUU has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Improving efficiency - repeated script hits at small interval by kvale (Monsignor) on Jun 26, 2002 at 19:50 UTC
The simplest way to cache data is to use a persistent process such as mod_perl. If there are minimal changes that always occur at the same place in a get or post string, do a quick substr operation on the string, rather than parsing it. -Mark	[reply]
Re: Improving efficiency - repeated script hits at small interval by Ryszard (Priest) on Jun 26, 2002 at 19:44 UTC
You could use a checksum to catch all the data that is the same. Nice and easy, you could use the checksum as a primary key, or a hash key. Not sure about small changes. You obviously have a method of retrieving the data, so perhaps that method could look for similarities or patterns, then only retrieve the delta. Its kinda had to give any concrete answer with out any specifics. If are able to provide an example of the data model it would make it easier to answer.	[reply]
Re: Improving efficiency - repeated script hits at small interval by perrin (Chancellor) on Jun 26, 2002 at 19:58 UTC
There are two ways to deal with cache consistency: time-to-live, and invalidation. In an invalidation system, the source of the data has to notify your cache when it makes changes. This can work well if you don't make changes very often. In a time-to-live system, you just return the cached data for a specified amount of time and ignore possible changes until then. If you need very up-to-date data and it changes frequently, you may not be able to use caching.	[reply]
Re: Improving efficiency - repeated script hits at small interval by shambright (Beadle) on Jun 26, 2002 at 22:12 UTC
I had a similar problem recently. I wrote a startup script for mod_perl to (1) set up a persistent database connection (in my case, MySQL), (2) declare some global variables to use within the script. (3) pre-load the script to be used by the outside world Changes that will be referenced "every now and then" can be updated quickly in MySQL without problems. Remember to keep your tables small and properly indexed. The fewer UPDATE statements, the better. My final solution used one UPDATE statement, which meant the table was locked for a minimum amount of time without a chance of data corruption from an ill-timed SELECT statement by another child process. Changes that must be immediately available to all can be stored in the global variable(s). Any future children will instantly have the correct info. Doing this, I was able to maintain correct information without problems with up to 30 script calls per minute. No storage of session data was necessary. The script for the outside world becomes very small -- get the info, make UPDATEs if necessary, and spit out an answer...	[reply]
Re: Improving efficiency - repeated script hits at small interval by caedes (Pilgrim) on Jun 26, 2002 at 22:16 UTC
Perhapse the answer you seek has been right under your nose? Check out the Everything Dev. Engine, aka the backend for everyone's favorite site ;-). Here's a Google link to the program's caching functions: click -caedes	[reply]
Re: Improving efficiency - repeated script hits at small interval by bronto (Priest) on Jun 27, 2002 at 09:05 UTC
A persistent process could be a solution. Maybe Matts' pperl could do something for you. Ciao! `--bronto` # Another Perl edition of a song: # The End, by The Beatles END { $you->take($love) eq $you->made($love) ; }	[reply]
Re: Improving efficiency - repeated script hits at small interval by BUU (Prior) on Jun 26, 2002 at 21:28 UTC
Ok, elaboration time. This is also going to invovle session persistience type ideas. Basically whats going to happen, is client A first hits the script. The script generates an object/data/etc for him, by a series of sql queries, some processing, some files, etc etc. He has his data. That data is then stored in the session attached to him via a cookie. Then the script generates an output screen for him, made up of some data from his "object", from other peoples objects, some environment data, etc etc. This screen would be sent back to him. However, the client would be constantly (2-3 a minute) refreshing the script. Since the output data is dependent on "outside" factors (i.e. not just data contained by the clients session object dealy), the output data that is sent back might be changed a little bit, (a few numbers), or might not be changed at all. As i mentioned above though, the little changes are critical, so whatever system i use must be able to keep sending back the new data. The problem (that i see from here, i havent tested/benched this at all) is that every time the client refreshes, it would have to generate the "output data" all over again, not to mention reinitialize the session data and so forth. Also note that this solution must be entirely "browser based", i.e. using nothing other then what a default browser has (and the server of course), which means basically html/javascript (yes, it will be required. deal.) is allowed, as well as whatever serverside perl / db magic you can conjure up. Update: I guess what this mostly boils down to, is there any way i can avoid having to constantly reinitialize 'session/objects' per user? It seems to me that every time the client hits the script, it would have to go through a (relatively) huge amount of work, reloading all the files, constructing the objects from mysql queries etc.	[reply]
Re: Re: Improving efficiency - repeated script hits at small interval by perrin (Chancellor) on Jun 27, 2002 at 03:25 UTC
Is there any way you can have 100% up-to-date data without doing the work to look it up? Not really. If there was, Oracle would be out of business. The closest you can get is the invalidation technique that I discussed above: every time anyone changes any of the data that you generate your responses from, you have to clear any cached responses that depend on that data. This is only possible in some situations. I'd suggest you build it with no caching first, and then if it's slow you can look for where you'll get the most bang for your caching buck. You may find that your users will gladly trade a 5-minute delay on certain kinds of data for significantly better page response. Also, think twice about putting all of that data into "session" objects. Unless it is specific to a single user, it doesn't belong in a session. Things like query results are not usually user-specific, i.e. a search for all stocks that have gone up in the 24 hours should be cached under "stocks, age < 24", not some session ID. In most systems users frequently repeat each other's queries, so caching results in a user-independent way gives better hit rates in the cache.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.