Database poller

Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Database poller by Zaxo (Archbishop) on Jun 15, 2003 at 01:00 UTC
If your rdbms supports it, stored procedures would speed things up. A trigger on insertion would let you avoid polling. Do you remove messages as they are read? If not, a timestamp or 'seen' flag would let you reduce the size of responses. After Compline, Zaxo	[reply]
Re: Re: Database poller by Marcello (Hermit) on Jun 15, 2003 at 08:21 UTC
Hi Zaxo, I'm using MySQL so stored procedures and triggers are not yet supported unfortunately. I am deleting records when they are processed, so the database table is nearly always empty. Maybe another intermediate database table could help? I was thinking of a table where the main process (which accepts the clients) stores the currently active client processes including their process id. So if the database process finds a message, it checks the process table to see if the client to which the message should be delivered is currently logged in. All it has to do after that is signal the process that there are messages waiting. Maybe this is a good approach? Regards, Marcel	[reply]
Re: Database poller by mobiGeek (Beadle) on Jun 15, 2003 at 15:07 UTC
The question is: where is the bottleneck? There is no point trying to "throw" optimizations at the problem if you don't know where the problem is. Changing your forking model or your schema or your queries won't help if that isn't where the problem lies. So, my suggestion is to profile your code; find out where in your process large amounts of time are being taken. Then fix those areas or post back with the specifics looking for particular optimizations. It may seem like a lot of work (possibly not-fun work) trying to determine the bottlenecks. But it sure beats reworking a whack of code only to find that the particular problem is not solved.	[reply]
Re: Database poller by EvdB (Deacon) on Jun 15, 2003 at 15:06 UTC
Not an answer to your question but: You could look into getting rid of the database and instead setting up something similar to a mail folder, say Maildir. As this was designed for use with email which has many of the same traits of your problem it should work just fine. As many of the files will be cached in RAM doing this there should be no real performance penalty either due to disk access. Maildir does not work on Win32 due to the filenames used, but something similar should be possible. You could have a seperate folder for each client gaining you the security mentioned above. Backups would also become very simple. Of course if the daemons are on a different machine to the files then... --`tidiness is the memory loss of environmental mnemonics`	[reply] [d/l]
Re: Database poller by aquarium (Curate) on Jun 15, 2003 at 11:22 UTC
if indeed your bottleneck is the table holding the messages (i.e. get/delete queue gets long) then you could split up the table so that each client has it's own table. this also avoids having to store client id and check for it in every message. This will definitely relieve contention issues + improves security of data. also, check on your message lengths that they don't exceed max mysql packet length and modify accordingly, and even zip the data en-route if cpu load is low.	[reply]
Re: Re: Database poller by mpeppler (Vicar) on Jun 16, 2003 at 07:25 UTC
It is in general a bad idea to create one (identical) table for each client (or one table for each day, etc.) Yes, this probalby provides a short-term performance gain, but it will lead to serious scalability and administrative headaches, and is really not the way a relational database should be used. If the database server is the problem then you should see what can be done to tune the server to perform better (indexes, TCP options, etc). And is it really necessary to query the database for each client 5 times per second? Michael	[reply]
Re: Re: Re: Database poller by Marcello (Hermit) on Jun 16, 2003 at 11:01 UTC
Hi mpeppler, Messages should be delivered to clients asap, so at least 2 times per second. The problem is not the database at the moment, but I want a scalable solution. If every client process checks the database this way, it will eventually become a performance issue. Therefore, one process checking the database allows for many clients to connect at once. Regards, Marcel	[reply]
Re: Re: Re: Database poller by aquarium (Curate) on Jun 17, 2003 at 12:13 UTC
i'm sorry but you are wrong...it produces a long-term performance gain. if you have a 100 clients, yes you will need to custom build some admin interface -- you will also want one if you're using a single table, and have a jumble of messages in it. Also, with one possibly large and much sought after table, if any client mis-behaves and slows things down, all clients will suffer. you also need to be very careful when administering messages for one client as not to affect others and corrupt their data too. Scalability? if your prog is modular enough you can farm out client tables to separate databases on separate boxes. One other thing that can be done to improve performance is to have a "already read" field that you check for when retrieving messages, and only every now and then delete all those "already read" messages in one update query, so the rdbms is not shuffling data every time you take off a message from the table. The data contention issue is the most likely bottleneck you will encounter with your current design. as the number of clients grows, performance will drop very quickly for each new client addition. sometimes you do need to de-normalize your data to suit real world applications. in fact, relational theory calls for de-normalization after first normalizing your data, if that is necessary. in this case, it is.	[reply]
Re: Re: Re: Re: Database poller by mpeppler (Vicar) on Jun 17, 2003 at 14:24 UTC