http://qs1969.pair.com?node_id=593679

oyse has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am implementing a web-application where more than one user can access the same row at the same time. This can lead to problems, since one user might update the row while the other users look at old data. They might make updates of the old data, thereby overwriting the changes of the first user. To prevent this problem I want to implement some kind of row locking at the application level.

The way I plan to implement the locking is by creating a table in the database that contains information about all the locked rows. When a user wants to view a row, I will check if it is locked by looking in the table. It the row is not locked the user will acquire a lock on the row.

The algorithm as I see it will contain a race condition since I will first check if the row is locked using a SELECT query, if it is not locked a row will be inserted using an INSERT query. The race conditon happens if two http requests manages to check the table before one of them inserts/acquires the lock.

To solve the race condition, I plan to use transactions in DBI like this:

1. turn of autocommit
2. check if the row is locked (check the lock table)
3. if it is not locked, lock the row (insert a row in the lock table)
4. commit
5. turn on autocommit (since the rest of the application uses it)

I think this algorithm will work, but it will take me some time to implement and test. So before I start implementing it, it would be nice to know if any CPAN modules will do this for me?

Secondly, do you think the algorithm will prevent the race condition?

Replies are listed 'Best First'.
Re: Implementing rowlocking
by themage (Friar) on Jan 09, 2007 at 09:38 UTC
    Hi oyse,

    Using autocommit or not is irrelevant in your present situation as it would not prevent the the lock row from being inserted in another transaction at the same time.

    What you really nead is to LOCK your lock table:
    mysql> LOCK TABLE locks WRITE; mysql> SELECT * FROM locks where yourcond=true; mysql> INSERT locks (...) VALUES(...); mysql> UNLOCK TABLES;
    Some notes:
    • Don't forget to unlock the table, specially if your server is using persistent connections, like with Apache::DBI or pconnect in php, or no other connection will ever can lock the table.
    • You can use LOCK TABLE locks READ if you just need to read the table, but you should use the WRITE lock for the select and the insert, or another thread can write after you read.
    • Any connection that don't LOCK the table before attempting to do anything with it will success indepedently of the locks(, so you need to use the lock in every operation on this table).
    • You should not do any operation that takes long (0.01+ secs) with LOCKS, as only one thread can LOCK the table at any time.
    • All tables used after a lock and before an unlock must be locked.


Re: Implementing rowlocking
by ikegami (Patriarch) on Jan 09, 2007 at 16:08 UTC

    [ I see solutions. Hopefully, I can shed some light on the problem. ]

    First, it's important to realize that you want to lock a record for an extended period of time, one that can span longer than a connection. (arkturuz, herveus and myself in the chat last night didn't realize this.)

    There are two major problems with doing your own locking. The length of time for which you want to lock a record exacerbates both.

    First, the situation where you're trying to lock something that's already locked is hard to handle. If you were to use LOCK TABLE, it would block until the lock is obtained. If you were to use your own locking mechanism, then you'd have to keep trying to lock the data. That could put a heavy load on the DB if multiple clients are trying to lock locked records.

    Second, a crashed client can hold a record locked indefinitely. You could timeout locks, but that's dangerous.

    That's why people (davidrw, SheridanCat) are suggesting optimistic locking. It goes along the lines of

    • Don't lock the record while viewing/editing it.
    • When it's time update the record in the database, first make sure it hasn't changed.
    • Handle conflicts automatically and/or by having the user redo his update (in part or in whole).
      The optimisitic locking is a solid way to approach this. I would not devise a custom/app-specific locking strategy inside the database as you described earlier. That's just prone to more issues, complexity and maintenance.

      Regardless of Rdbms vendor locking or your custom locking, holding a lock for 5+ minutes is too long on a busy system. Seems that would lead to transaction blocking or even deadlocking. Good luck with your project.

Re: Implementing rowlocking
by arkturuz (Curate) on Jan 09, 2007 at 09:26 UTC
    PostgreSQL database server allows locking data per row. Before you do any coding, maybe you should consider installing it (if you have the permissons to do so).
    About the algorithm: if I understand correctly, it seems to me that it will not prevent race conditions (because, as you said "the race conditon happens if two http requests manages to check the table before one of them inserts/acquires the lock" but you don't check for it in your algorithm).
Re: Implementing rowlocking
by oyse (Monk) on Jan 09, 2007 at 15:05 UTC

    Thanks for all the replies. I see that I was clearly to vague in my original post, so I shall try to give some more details.

    This is a web-application and I have not found any reliable way to unlock rows when a users leaves a page, closes the web-browser etc. So to prevent a user from holding a lock that is no longer used, I want the following behaviour: A lock is only valid for a short time (for instance 5 minutes). After that the lock is not counted as a lock anymore; it is invalid. If a user wants to keep the lock longer, she must ask for it explictly, in which case the lock is held for another 5 minutes (or some other time).

    Regarding what herveus said:
    I agree that there generally is no reason to reinvent the wheel, but initially I could not find anyway to get SQL Server to lock rows for me explicitly (changing the database is not an option at this time). All I found where descriptions of how SQL Server locked rows when performing INSERTS/UPDATES etc. and how to optimize SQL Servers behaviour. I have now found a way for SQL Server to lock rows, but I don't know if this way is database independent. Database independece is not critical, but desired. Do anyone know if it is possible to lock rows in a database independent way?

    Regarding what davidrw said:
    This is what I have implemented already. Each row has a version number and if the version number is lower than what is stored in the database when the user tries to update a row, the update is rejected and a error message is issued. This is not acceptable to my users since they can use a long time filling out a form before submitting. If they lose the data they used a long time entering, they will not be happy users.

    Regarding what SheridanCat said:
    Letting the users fix any conflicts can be a good solution, but I am uncertain if it is the right solution here. Conflicts are mostly likely to happen at days when my users have a lot to do, and letting them fix conflicts at this time might be frustrating for them. I should clearly have checked this more thoroughly though.

    BTW I think I have a race condition free implementation in Perl now thanks to one of the people that helped my at the chatterbox. The person suggested letting the column that identifies a locked row have a UNIQUE constraint. When performing and insert the RDBMS will reject an insert that tries to lock a row (by inserting into the ROWLOCK table) that is already locked and DBI will cast an exception. By placing the DBI execute call in an eval{} it is easy to see if the user acquired the lock or not, and two users can not have the lock at the same time.

    I will now look more close at your suggestion and see if I should do something differently. Thanks again for all replies.

      The unique constraint idea is clever. But using side effects like the constraint throwing an error to do significant stuff makes me uncomfortable. Document it well so the maintenance programmer who looks at the app in four years will understand how you're using that.
      Just 2 notes:

      1. I think that rowlocking (whatever it means) is not (probably) necessary during whole user session in described situation. I don't know real requirements, but consider the scenario when user sends both old (which he has received) and new values (which he has probably changed). The application logic can decide, if it is necessary to lock some rows for a while, check if the data in rows are equal to old values and change it into new values. And unlock, of course. (I suppose transaction isolation level = read commited and the transaction is started on the begin of every user request and ended on the every end of user request)

      2. If you really don't have a mechanism for locking rows, you can mimize it using special column in the locked table, which is for pid of locking process. Consequently, you should have "WHERE pid = $$" in all updating/deleting commands. Just idea, probably leading to messy code, better way is to have PostgreSQL :>)

Re: Implementing rowlocking
by davidrw (Prior) on Jan 09, 2007 at 13:06 UTC
    one user might update the row while the other users look at old data. They might make updates of the old data, thereby overwriting the changes of the first user.
    first, ditto on using a solid RDBMS--you didn't mention what you're using ..

    second, here's something on the app level you can do to help prevent this:
    • put a last_updated (or similarly named) column in your table
    • either add a trigger to set last_updated=NOW on any INSERT/UPDATE, or make sure that your code does so
    • when data is retrieved for the user, keep track (session, hidden var, etc) of not only the row's pk, but also the last_updated
    • when user submits for saving, throw error (or warning screen) if the stored last_updated value for this row is before the current value in the db (meaning someone updated it in the meantime)
    Bugzilla used to (may still?) do something similar to this and would throw a "mid-air collision" error message.
Re: Implementing rowlocking
by SheridanCat (Pilgrim) on Jan 09, 2007 at 13:56 UTC
    You're basically just moving the race condition, which often happens when trying to avoid races.

    You've gotten suggestions that can surely help, such as locking the lock table. That type of solution can have significant problems when some process dies unexpectedly and the lock isn't removed. Perhaps you'll have a cleanup script that removes dead locks or some such.

    How about considering implementing something like common version control systems use, Optimistic_concurrency_control (or "optimistic locking" which Wikipedia tells me is a misnomer). Time stamp or version updates to data; when updates conflict, ask the user creating the conflict to fix it. The idea being that conflicts are uncommon.

    Only you know your application, so I may be talking out of my hat. However, you may find you're going to extreme ends to account for a circumstance that rarely occurs.

      Edit: Moved the post to the original thread.
Re: Implementing rowlocking
by herveus (Prior) on Jan 09, 2007 at 12:08 UTC
    Howdy!

    It doesn't sound like a homework problem, or "personal research", where reinventing the wheel has pedagogic value. Based on that, I'm not going to try to answer either question.

    Your first line of attack should be to use an RDBMS that provides row level locking. You get the benefit of the considerable effort others have put into making it work right. You also get to focus your attention on your application instead of having to become a database engine writer as well as an application developer.

    yours,
    Michael
Re: Implementing rowlocking
by nmerriweather (Friar) on Jan 09, 2007 at 18:18 UTC
    1. update your schema to have a 'record version'
    2. when people pull a record they get 2 variables: record version , md5( site_secret + record version )
    3. when people view a record again , or for update, you check
      1. is the record version and md5 hash valid? if not, display error
      2. is the record version the current one? if not, display error
Re: Implementing rowlocking
by xiaoyafeng (Deacon) on Jan 10, 2007 at 02:54 UTC
    I don't think implement row locking at the application level is a good idea. Almost all Database provide lock table mechanism for read consistency.Such as Oracle, you could simply add "for update" sub statment to solve this quesion.Please refer "Expert one on one" for more details.(sorry that I forget URL)


    I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction
Re: Implementing rowlocking
by oyse (Monk) on Jan 10, 2007 at 08:51 UTC

    Thanks again for all your replies. In my opinion a lot of valid arguments have been presented and it has certainly made me think harder about the solution.

    I obviously chose a bad title for the post, since it seems to me that some think that I want to implement the part of a database that ensures that two users don't update a row at the same time, i.e. concurrently. This is already handled by any database worth using and that is not what I am trying to prevent. I want to prevent that two users request to view the same row, the first one then updates the row and at some later time (since the page has not been refreshed) the second user overwrites the changes from the first user without ever knowing that the first user made any changes. Sorry that I did not make this clear enough and hopefully this clears up any misunderstandings.

    If one of the administrators have a better suggestion for a title for this post, feel free to change it.

    Now that I have thought more about the problem, I agree that the optimistic locking strategy seems to be the best solution in most situations, including this one. Hopefully I will get my customer to agree. The reason I think it is the best solution is mainly that it is the least complex one. It only requires to check for conflicts when the user updates, while a locking strategy will require more extensive changes in the application. The locking strategy also requires some cleverness and as SheridanCat pointed out, that is not good when it comes to maintainance. Also the way I plan/planned to implement locking would require some AJAX tricky to work in a way that was transparent to the user and in my limited experience getting AJAX to work flawlessly takes time.

    That being said the optimistic locking strategy has some application specific problems that will make it a bit harder than what is already discussed above. I will not bore you with the details.

    I don't know which strategy will be used yet since it is not entirely up to me to decide what is an acceptable solution. So it is still interesting to shed more light on the locking solution.

    There are two problems that ikegami point out that I am not sure are valid for my solution.

    First, the situation where you're trying to lock something that's already locked is hard to handle. If you were to use LOCK TABLE, it would block until the lock is obtained. If you were to use your own locking mechanism, then you'd have to keep trying to lock the data. That could put a heavy load on the DB if multiple clients are trying to lock locked records.

    I don't think it is necessary to block or continuesly check if the lock is free. Just give the user a signal that the row is locked. Then open the row in readonly mode or something like that. It would work quite similar to opening a document on a shared disk in windows that another user have opened. The problem will be to get a message to users that have opened it in readonly mode when it is unlocked. This can probably be handled by refreshing the page regularly (for instane once every couple of minutes).

    Second, a crashed client can hold a record locked indefinitely. You could timeout locks, but that's dangerous.

    I don't see why it is dangerous to timeout locks. In my solution there are no process that run periodically to remove old locks. When you check if a row is locked, locks that are too old will be removed before any checking is performed (I don't think I explained this). Do you still think it is dangerous to use timeouts given these new details?.

    Puh! That was a long post, hope it made things a bit clearer. BTW I will post my final solution in this thread when I am done implementing the feature.

      Most commercial systems I've worked with implement this in a much simpler manner: by simply adding a datestamp to the row in question. When you render the edit form, include the datestamp as a hidden field. When the user submits the update, compare the datestamp with the one in the row. If the row contains a newer one, there's a overwrite issue. Ask user to confirm they want to overwrite, maybe give them some idea what's changed. Bonus points for using ajax to notify open clients of the row change while they're editing the records. I think this is orders of magnitude simpler than your attempt to reinvent client/server locking behavior in the browser (which might just be a first - which should scare you), and it's backed up by lots of prior art, so you know it will work.
        fwiw, nmerriweather and I are saying the same thing. I didn't see that post before I posted.

        KISS

Re: Implementing rowlocking
by oyse (Monk) on May 08, 2007 at 21:26 UTC

    I sort of promised to post my final solution to this problem and then of course forgot all about it. Now I finally remembered it and I had the time to write the post. So if someone should stumble upon this post at a later time perhaps this will help you a bit.

    I finally ended with the optimistic locking solution; basically because it seemed simplest to implement. I thought quite hard about it and still believe that the row-locking solution could be implemented, but as you can see of the discussion above, not everyone agrees.

    The optimistic locking is implemented by adding a column containing the current version of a row. This number is incremented each time the row is updated. The version number is included as an hidden field in the HTML page and when the user submits his/her update the value of the hidden version field is compared with the version number in the database. If the version number in the database is larger than the one in the hidden field some other user have performed an update.

    This is quite easy to implement and this solutiuon was suggested by several of the posters (this was also the solution I used before starting the thread). The problem that remains to solve is what to do when the conflict is discovered.

    I many cases it is ok to just inform the user that the update could not be performed and ask them to refresh the page. In my case this was not satisfactory as the user could use a very long time retrieving the necessary information from different sources and filling out the form. By refreshing the page they could lose a lot of work. To give them a more acceptable solution the following was implemented.

    When the version numbers for a row conflict, each column in the row will be treated separately. This is so that only real conflicts at the column level need to be considered by the user. For each column there a four interesting possibilities:

    1. The value was not changed by the user and was not changed in the database.
    2. The value was changed by the user, but was not changed in the database
    3. The value was not changed by the user, but was changed in the database
    4. The value was changed by the user and was also changed in the databasse.

    Case 1. is trivial and there is not conflict. There is no reason to bother the user with this case.

    Case 2. and 3 requires some thinking and what you want to do depends on the application. Often these conflicts can be handled automatically by the system by choosing the most recent value. For case 2 and 3 that means the value of the user or the value in the database, respectivly. In my application such a conflict could in rare cases lead to inconsistent data, so I choose to let the user decided, but with a default suggestion already made.

    Case 4 is a clear conflict and must be decided by the user.

    To implement this you need three lists (or hashes) of values. One list contains the value submitted by the user, one contains the values presented to the user originally (that is the values in the database at the time the user loaded the page) and the last list contains the current values in the database. For those of you that like code, this is the loop I used to check for the conflicts. The code is not complete.

    #mapping between columns with conflicts and their new values. my %conflicts = (); #mapping between the columns with warnings and their new values. my %warnings = (); while ( my ( $columnname, $new_value ) = each %$new_values ) { #We know there is a conflict on VERSION and it is not interresting next if $columnname eq 'VERSION'; my $old_value = $old_values->{ $columnname }; my $database_value = $database_values->{ $columnname }; #all the values are the same, no conflict next if $database_value eq $new_value; #changed by user and updated in database, a sure conflict. if ( $new_value ne $old_value && $database_value ne $old_value ) { $conflicts{ $columnname } = $new_value; next; } #no other test was true, then either changed by user or #in the database. That consitutes a warning. $warnings{ $columnname } = $new_value }

    Hopefully this was a useful explanation from someone with the same problem. BTW I will probably not keep checking this thread so don't expect any answers :)

    As an side note I can mention that if you use the SQL Server datatype ROWVERSION you need to convert it to a string before placing it on the HTML page and comparing it. The following code will convert it correctly on a standard Intel box given that version number is stored in the variable $version.

    my $hex_string = unpack "H16", $version;