in reply to Auto-Increment and DBD Agnosticism

The obvious and simple fix is simply to perform a select right after the insert, calling up the auto-increment value that way. This mostly works, though I am concerned that it is going to result in my code executing twice as many SQL statements, and there is the further danger that the saved row is not unique.

This method is not safe in multi-process environments at all (CGI, mod_perl, etc) as the INSERT followed by the SELECT is not an atomic action. I would stay away from this kludge as it will only get you in trouble down the road.

On the other hand, I wonder what kind of overhead already exists as the result of the mysql_insertid's availability. Is DBD::mysql already transparently doing something along these lines, or is it somehow optimized?

No, there is no extra SQL call by DBD::Mysql (as best I know without looking at the source that is). I believe that in the client library for MySQL (which DBD::MySQL uses) the 'last_insert_id' is a function/variable/environmental-parameter of some kind. So its actually optimized in MySQL itself.

*snip a bunch of stuff* This still leaves a bitter taste in my mouth, but at least it would avoid the superfluity of executing a second statement that wasn't going to be of any value.

That whole idea leaves a bitter taste in my mouth as well. Anything that is assuming you don't have duplicate data (aside from the primary key) will get you in trouble.

So, what I would really like to know... Does some better, less hackish way exist to accomplish that which I have described, or am I stuck dealing with it in some way along the lines of what I have delineated herein?

Subclassing, subclassing, subclassing. There is no such thing as truely "vanilla" SQL (at least not that I have seen and which is usable in a real world situation), so the idea of complete DBD (therefore database) agnosticism is a fantasy. There really exists only one elegant solution, which is to subclass.

Also, keep in mind that all DBDs are not alike. Just because the DBI specification says it should do "A", doesn't mean the driver author has to actually implement that (or even better, have time to implement it). You would be surprised how minimal a DBD driver can be, and still "work" with DBI.

-stvn

Replies are listed 'Best First'.
Re^2: Auto-Increment and DBD Agnosticism
by skyknight (Hermit) on Jun 23, 2004 at 20:41 UTC

    Well, what I was envisioning would be safe in a multi-process environment. It would not be selecting the max id of the table, but rather selecting the id of the row that was inserted by using as conditions the values of the fields of the object that was just saved. I didn't like it as a fix without the imposition that some other key be specified as unique, and furthermore it worries me because of the overhead of the extra query, particularly if it means piping a lot of data back to the server in the form of the conditions.

    I upgraded to DBI 1.42, and tried using the last_insert_id() method, but alas it refused to do anything other than return undef. I don't know what is up with that... My version of DBD::mysql is one away from the latest version, so I tried installing the new one on the suspicion that this would fix it since the latest version of DBD:mysql was released right after the version of DBI that incorporated the last_insert_id() method, but the compile barfed. :-(

      Well, what I was envisioning would be safe in a multi-process environment. It would not be selecting the max id of the table, but rather selecting the id of the row that was inserted by using as conditions the values of the fields of the object that was just saved

      Even that is not safe, since you are relying on the fact that all your data (aside from the primary key) will be unique. This is okay if you have other uniquness constraints on the table, but otherwise you will eventually run into a problem with this. Most systems of this nature rely on the primary key as the natural unique identifier, I would really suggest you stick with that, and subclass for different DBDs.

      -stvn

        On a related thread, I'm wondering what is the most elegant way to organize the classes with which I am doing subclassing with respect to locating the code and namespacing. Under whatever namespace I eventually use, I have an Object class (we'll call it Whatever::Object) that lives in an Object.pm file. I also have classes Whatever::Object::Query and Whatever::Object::ResultSet, which live respectively in the files Query.pm and ResultSet.pm in a sub-directory called Object. So, the directory structure is something like...

        lib/
          Whatever/
            Object.pm
            Object/
              Query.pm
              ResultSet.pm  
        

        Now, relevant to our original thread, the Object class has a save method that either inserts the object into the database, or updates an existing record. The method, in a very non-DBD-agnostic way, currently grabs the mysql_insertid from the statement handle. What I want to do is change my Object constructor to bless the thingy into a subclass, e.g. Whatever::Object::MySQL, and that class will provide a save method that will deal with the assignment of surrogate primary key values in a DBD-specific fashion.

        From the perspective of a class relationship hierarchy, it's clear in my mind. What is not clear to me, however, is where the subclass really ought to live. I suppose it could just live in the Object directory, e.g. in a file called MySQL.pm, along with Query and ResultSet, but that feels weird to me... It feels odd because in the case of Query and ResultSet you have the .pm files living somewhere simply because of namespacing, whereas in the case of the Object subclass for MySQL, it's living there because Object is its base class. Is it right for me to feel that this would be weird, or is this the standard way to it? The only alternative that I can fathom would be to put the MySQL class definition inside the Object.pm file. Would that be better or worse, and is there any other alternative?

        Most systems of this nature rely on the primary key as the natural unique identifier, I would really suggest you stick with that, and subclass for different DBDs.

        I'm not sure that I entirely agree with that. It's legitimate to have another unique identifier. You just don't want to be using it as a foreign key in another table. The argument for having surrogate primary keys is that they are meaningless as anything except a row identifier, having no semantic meaning as far as your data is concerned, and thus are non-volatile and consequently safe to use for linking objects/rows together.

        I agree with your assertion that subclassing is the right thing to do. I just don't think that the primary key has to be the only unique identifier. For example, think of a user table... The user id is going to be unique. It would be a bad idea to use it as a foreign key in link tables, but it would be unique all the same.

        Anyway, I'm not sure what I'm arguing anymore... I didn't get enough sleep last night and am not entirely coherent. :-)