oyse has asked for the wisdom of the Perl Monks concerning the following question:
I am implementing a web-application where more than one user can access the same row at the same time. This can lead to problems, since one user might update the row while the other users look at old data. They might make updates of the old data, thereby overwriting the changes of the first user. To prevent this problem I want to implement some kind of row locking at the application level.
The way I plan to implement the locking is by creating a table in the database that contains information about all the locked rows. When a user wants to view a row, I will check if it is locked by looking in the table. It the row is not locked the user will acquire a lock on the row.
The algorithm as I see it will contain a race condition since I will first check if the row is locked using a SELECT query, if it is not locked a row will be inserted using an INSERT query. The race conditon happens if two http requests manages to check the table before one of them inserts/acquires the lock.
To solve the race condition, I plan to use transactions in DBI like this:
1. turn of autocommit
2. check if the row is locked (check the lock table)
3. if it is not locked, lock the row (insert a row in the lock table)
4. commit
5. turn on autocommit (since the rest of the application uses it)
I think this algorithm will work, but it will take me some time to implement and test. So before I start implementing it, it would be nice to know if any CPAN modules will do this for me?
Secondly, do you think the algorithm will prevent the race condition?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Implementing rowlocking
by themage (Friar) on Jan 09, 2007 at 09:38 UTC | |
Using autocommit or not is irrelevant in your present situation as it would not prevent the the lock row from being inserted in another transaction at the same time. What you really nead is to LOCK your lock table: Some notes:
| [reply] [d/l] [select] |
Re: Implementing rowlocking
by ikegami (Patriarch) on Jan 09, 2007 at 16:08 UTC | |
[ I see solutions. Hopefully, I can shed some light on the problem. ] First, it's important to realize that you want to lock a record for an extended period of time, one that can span longer than a connection. (arkturuz, herveus and myself in the chat last night didn't realize this.) There are two major problems with doing your own locking. The length of time for which you want to lock a record exacerbates both. First, the situation where you're trying to lock something that's already locked is hard to handle. If you were to use LOCK TABLE, it would block until the lock is obtained. If you were to use your own locking mechanism, then you'd have to keep trying to lock the data. That could put a heavy load on the DB if multiple clients are trying to lock locked records. Second, a crashed client can hold a record locked indefinitely. You could timeout locks, but that's dangerous. That's why people (davidrw, SheridanCat) are suggesting optimistic locking. It goes along the lines of | [reply] |
by smithers (Friar) on Jan 09, 2007 at 20:45 UTC | |
Regardless of Rdbms vendor locking or your custom locking, holding a lock for 5+ minutes is too long on a busy system. Seems that would lead to transaction blocking or even deadlocking. Good luck with your project. | [reply] |
Re: Implementing rowlocking
by arkturuz (Curate) on Jan 09, 2007 at 09:26 UTC | |
About the algorithm: if I understand correctly, it seems to me that it will not prevent race conditions (because, as you said "the race conditon happens if two http requests manages to check the table before one of them inserts/acquires the lock" but you don't check for it in your algorithm). | [reply] |
Re: Implementing rowlocking
by oyse (Monk) on Jan 09, 2007 at 15:05 UTC | |
Thanks for all the replies. I see that I was clearly to vague in my original post, so I shall try to give some more details. This is a web-application and I have not found any reliable way to unlock rows when a users leaves a page, closes the web-browser etc. So to prevent a user from holding a lock that is no longer used, I want the following behaviour: A lock is only valid for a short time (for instance 5 minutes). After that the lock is not counted as a lock anymore; it is invalid. If a user wants to keep the lock longer, she must ask for it explictly, in which case the lock is held for another 5 minutes (or some other time).
Regarding what herveus said:
Regarding what davidrw said:
Regarding what SheridanCat said: BTW I think I have a race condition free implementation in Perl now thanks to one of the people that helped my at the chatterbox. The person suggested letting the column that identifies a locked row have a UNIQUE constraint. When performing and insert the RDBMS will reject an insert that tries to lock a row (by inserting into the ROWLOCK table) that is already locked and DBI will cast an exception. By placing the DBI execute call in an eval{} it is easy to see if the user acquired the lock or not, and two users can not have the lock at the same time. I will now look more close at your suggestion and see if I should do something differently. Thanks again for all replies. | [reply] |
by SheridanCat (Pilgrim) on Jan 09, 2007 at 15:41 UTC | |
| [reply] |
by pajout (Curate) on Jan 09, 2007 at 15:43 UTC | |
1. I think that rowlocking (whatever it means) is not (probably) necessary during whole user session in described situation. I don't know real requirements, but consider the scenario when user sends both old (which he has received) and new values (which he has probably changed). The application logic can decide, if it is necessary to lock some rows for a while, check if the data in rows are equal to old values and change it into new values. And unlock, of course. (I suppose transaction isolation level = read commited and the transaction is started on the begin of every user request and ended on the every end of user request) 2. If you really don't have a mechanism for locking rows, you can mimize it using special column in the locked table, which is for pid of locking process. Consequently, you should have "WHERE pid = $$" in all updating/deleting commands. Just idea, probably leading to messy code, better way is to have PostgreSQL :>) | [reply] |
Re: Implementing rowlocking
by davidrw (Prior) on Jan 09, 2007 at 13:06 UTC | |
one user might update the row while the other users look at old data. They might make updates of the old data, thereby overwriting the changes of the first user.first, ditto on using a solid RDBMS--you didn't mention what you're using .. second, here's something on the app level you can do to help prevent this: | [reply] |
Re: Implementing rowlocking
by SheridanCat (Pilgrim) on Jan 09, 2007 at 13:56 UTC | |
You've gotten suggestions that can surely help, such as locking the lock table. That type of solution can have significant problems when some process dies unexpectedly and the lock isn't removed. Perhaps you'll have a cleanup script that removes dead locks or some such. How about considering implementing something like common version control systems use, Optimistic_concurrency_control (or "optimistic locking" which Wikipedia tells me is a misnomer). Time stamp or version updates to data; when updates conflict, ask the user creating the conflict to fix it. The idea being that conflicts are uncommon. Only you know your application, so I may be talking out of my hat. However, you may find you're going to extreme ends to account for a circumstance that rarely occurs. | [reply] |
by oyse (Monk) on Jan 09, 2007 at 14:34 UTC | |
| [reply] |
Re: Implementing rowlocking
by herveus (Prior) on Jan 09, 2007 at 12:08 UTC | |
It doesn't sound like a homework problem, or "personal research", where reinventing the wheel has pedagogic value. Based on that, I'm not going to try to answer either question. Your first line of attack should be to use an RDBMS that provides row level locking. You get the benefit of the considerable effort others have put into making it work right. You also get to focus your attention on your application instead of having to become a database engine writer as well as an application developer.
yours, Michael | [reply] |
Re: Implementing rowlocking
by nmerriweather (Friar) on Jan 09, 2007 at 18:18 UTC | |
| [reply] |
Re: Implementing rowlocking
by xiaoyafeng (Deacon) on Jan 10, 2007 at 02:54 UTC | |
I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction | [reply] |
Re: Implementing rowlocking
by oyse (Monk) on Jan 10, 2007 at 08:51 UTC | |
Thanks again for all your replies. In my opinion a lot of valid arguments have been presented and it has certainly made me think harder about the solution. I obviously chose a bad title for the post, since it seems to me that some think that I want to implement the part of a database that ensures that two users don't update a row at the same time, i.e. concurrently. This is already handled by any database worth using and that is not what I am trying to prevent. I want to prevent that two users request to view the same row, the first one then updates the row and at some later time (since the page has not been refreshed) the second user overwrites the changes from the first user without ever knowing that the first user made any changes. Sorry that I did not make this clear enough and hopefully this clears up any misunderstandings. If one of the administrators have a better suggestion for a title for this post, feel free to change it. Now that I have thought more about the problem, I agree that the optimistic locking strategy seems to be the best solution in most situations, including this one. Hopefully I will get my customer to agree. The reason I think it is the best solution is mainly that it is the least complex one. It only requires to check for conflicts when the user updates, while a locking strategy will require more extensive changes in the application. The locking strategy also requires some cleverness and as SheridanCat pointed out, that is not good when it comes to maintainance. Also the way I plan/planned to implement locking would require some AJAX tricky to work in a way that was transparent to the user and in my limited experience getting AJAX to work flawlessly takes time. That being said the optimistic locking strategy has some application specific problems that will make it a bit harder than what is already discussed above. I will not bore you with the details. I don't know which strategy will be used yet since it is not entirely up to me to decide what is an acceptable solution. So it is still interesting to shed more light on the locking solution. There are two problems that ikegami point out that I am not sure are valid for my solution. First, the situation where you're trying to lock something that's already locked is hard to handle. If you were to use LOCK TABLE, it would block until the lock is obtained. If you were to use your own locking mechanism, then you'd have to keep trying to lock the data. That could put a heavy load on the DB if multiple clients are trying to lock locked records. I don't think it is necessary to block or continuesly check if the lock is free. Just give the user a signal that the row is locked. Then open the row in readonly mode or something like that. It would work quite similar to opening a document on a shared disk in windows that another user have opened. The problem will be to get a message to users that have opened it in readonly mode when it is unlocked. This can probably be handled by refreshing the page regularly (for instane once every couple of minutes). Second, a crashed client can hold a record locked indefinitely. You could timeout locks, but that's dangerous. I don't see why it is dangerous to timeout locks. In my solution there are no process that run periodically to remove old locks. When you check if a row is locked, locks that are too old will be removed before any checking is performed (I don't think I explained this). Do you still think it is dangerous to use timeouts given these new details?. Puh! That was a long post, hope it made things a bit clearer. BTW I will post my final solution in this thread when I am done implementing the feature. | [reply] |
by Anonymous Monk on Jan 10, 2007 at 21:24 UTC | |
| [reply] |
by Anonymous Monk on Jan 11, 2007 at 01:42 UTC | |
KISS | [reply] |
Re: Implementing rowlocking
by oyse (Monk) on May 08, 2007 at 21:26 UTC | |
I sort of promised to post my final solution to this problem and then of course forgot all about it. Now I finally remembered it and I had the time to write the post. So if someone should stumble upon this post at a later time perhaps this will help you a bit. I finally ended with the optimistic locking solution; basically because it seemed simplest to implement. I thought quite hard about it and still believe that the row-locking solution could be implemented, but as you can see of the discussion above, not everyone agrees. The optimistic locking is implemented by adding a column containing the current version of a row. This number is incremented each time the row is updated. The version number is included as an hidden field in the HTML page and when the user submits his/her update the value of the hidden version field is compared with the version number in the database. If the version number in the database is larger than the one in the hidden field some other user have performed an update. This is quite easy to implement and this solutiuon was suggested by several of the posters (this was also the solution I used before starting the thread). The problem that remains to solve is what to do when the conflict is discovered. I many cases it is ok to just inform the user that the update could not be performed and ask them to refresh the page. In my case this was not satisfactory as the user could use a very long time retrieving the necessary information from different sources and filling out the form. By refreshing the page they could lose a lot of work. To give them a more acceptable solution the following was implemented. When the version numbers for a row conflict, each column in the row will be treated separately. This is so that only real conflicts at the column level need to be considered by the user. For each column there a four interesting possibilities: Case 1. is trivial and there is not conflict. There is no reason to bother the user with this case. Case 2. and 3 requires some thinking and what you want to do depends on the application. Often these conflicts can be handled automatically by the system by choosing the most recent value. For case 2 and 3 that means the value of the user or the value in the database, respectivly. In my application such a conflict could in rare cases lead to inconsistent data, so I choose to let the user decided, but with a default suggestion already made. Case 4 is a clear conflict and must be decided by the user. To implement this you need three lists (or hashes) of values. One list contains the value submitted by the user, one contains the values presented to the user originally (that is the values in the database at the time the user loaded the page) and the last list contains the current values in the database. For those of you that like code, this is the loop I used to check for the conflicts. The code is not complete.
Hopefully this was a useful explanation from someone with the same problem. BTW I will probably not keep checking this thread so don't expect any answers :) As an side note I can mention that if you use the SQL Server datatype ROWVERSION you need to convert it to a string before placing it on the HTML page and comparing it. The following code will convert it correctly on a standard Intel box given that version number is stored in the variable $version.
| [reply] [d/l] [select] |