impossiblerobot has asked for the wisdom of the Perl Monks concerning the following question:

Unlike many of our esteemed monks, who obviously either have Computer Science degrees or are currently in Computer Science program, my formal programming education is essentially nonexistent. Although I've worked professionally as a programmer, as well as hiring, managing, and mentoring other programmers, I've never had an opportunity to have my code reviewed by other programmers who were qualified to give constructive criticism.

I've been a little hesitant to subject my code (and my ego) to public scrutiny, but it is a necessary step if I am to become a better programmer. I'm also hoping that it will serve the public good. :-)

(If there is nothing that others can learn from my code, then there will almost certainly be something to learn from the responses!)

This is a fairly typical program for me, though I usually write in a more modular style:

#!/usr/bin/perl -w ##################################################################### # exh_import.plx # By Brad Smithart (AKA Impossible Robot) # # $Revision: 1.1 $ # # Description: # Imports Exhibitor information from tab-delimited text # file into exh_search.cgi database tables. # Import file contains header row and data rows consisting of # a 'company' column followed by various optional columns # and any number of 'booth' columns. # ##################################################################### # Load Modules use strict; use DBI; use Getopt::Std; my (%opts, @insert_fields); getopts('cwfh', \%opts); # Get arguments unless (@ARGV > 1) # Check for correct number of arguments { my $program; ($program = $0) =~ s!.*[/\\]!!; # Could use File::Basename print <<"EOP"; Usage: $program [options] [input file] [database name] Imports exhibitor data into exhibit search database from tab-delimited text file. Options: -c Force case conversion (to title-case) -w Import URL data ('showurl' and 'web' fields) -f Import data using column headers for fields -h Expanded help (-w and -f options cannot be used together.) EOP print <<"EOP" if $opts{'h'}; Sample Import File Structures: ============================== Without '-w': +-------+-----+-----+-----+ |company|booth|booth|booth| +-------+-----+-----+-----+ |ABC Co.|1001 |2001 | | +-------+-----+-----+-----+ |DEF Co.|1002 | | | +-------+-----+-----+-----+ With '-w': +-------+-------+-------------+-----+-----+-----+ |company|showurl|web |booth|booth|booth| +-------+-------+-------------+-----+-----+-----+ |ABC Co.|1 |www.abcco.com|1001 |2001 | | +-------+-------+-------------+-----+-----+-----+ |DEF Co.|1 |www.defco.com|1002 | | | +-------+-------+-------------+-----+-----+-----+ With '-f': Uses column header names for field names; 'booth' fields must be in final columns. EOP exit; } die "\n**Error: '-w' and '-f' options cannot be used together.\n" if ($opts{'w'} && $opts{'f'}); my ($input_file, $database) = @ARGV; # Assign arguments to variables # Define Variables my $company_count = 0; # Database Variables my $dsn = "DBI:mysql:$database:localhost"; my $username = 'username'; my $enc_passwd = q{(<&%S<W=O>F0`}; # Not encrypted, just obscured my $passwd = unpack("u", $enc_passwd); my $exhibit_table = "exhibitor"; my $booth_table = "booth"; # Connect to database my $dbh = DBI->connect($dsn, $username, $passwd, { RaiseError => 1 }); # Print program identifier my $opt_text = $opts{'c'} ? '(with case conversion)' : ''; $opt_text .= $opts{'w'} ? '(with URL import)' : ''; $opt_text .= $opts{'f'} ? '(using field names from header)' : ''; print "Exhibitor Import $opt_text\n"; # Open file open(INPUT, "$input_file") or die "Couldn't open import file: $input_file\n"; my $header = <INPUT>; if ($opts{'f'}) { # Validate header $header =~ /^company/ or die "Invalid data - first column must be 'company'\n"; $header =~ /\tbooth/ or die "Invalid data - must contain 'booth' column\n"; # Remove booth fields, then get insert fields my ($fields) = split(/\tbooth/, $header); @insert_fields = split(/\t/, $fields); # I should probably also validate insert fields against database } else { @insert_fields = qw( company ); push(@insert_fields, qw( showurl web )) if $opts{'w'}; } # Setup queries my $sth_LastID = $dbh->prepare('SELECT LAST_INSERT_ID()'); my $insert_fields = join(',', @insert_fields); my $placeholders = join(',', map {'?'} @insert_fields); my $query_InsExh = <<"EOQ"; INSERT INTO $exhibit_table ( $insert_fields ) VALUES ( $placeholders ) EOQ my $sth_InsExh = $dbh->prepare($query_InsExh); my $query_InsBth = <<"EOQ"; INSERT INTO $booth_table ( booth_number, exhibitor_id ) VALUES ( ?, ? ) EOQ my $sth_InsBth = $dbh->prepare($query_InsBth); while (<INPUT>) { my (%data, @booths, @values); s/\"//g; # Remove all quotes (@data{@insert_fields}, @booths) = split(/\t/, $_); if (exists $data{'web'} && $data{'web'}) { $data{'web'} = 'http://'.$data{'web'} unless $data{'web'} =~ /^http\:/; } # Convert each word to ucfirst case $data{'company'} =~ s/(\w+)/\u\L$1/g if $opts{'c'}; $sth_InsExh->execute( @data{@insert_fields} ) && $company_count++; $sth_LastID->execute; my ($exhibitor_id) = $sth_LastID->fetchrow_array; foreach my $booth (@booths) { if ($booth =~ /\w+/) { $sth_InsBth->execute($booth, $exhibitor_id); } } } # Disconnect from database $sth_LastID->finish; $dbh->disconnect; print "----------\n$company_count companies inserted\n"; exit;

Please feel free (as I know you will) to criticize style, efficiency, etc. I would also be curious to hear how this might be tranformed into an OOP style (which I think would be a good exercise for me). Thanks for any advice you can give me.

Impossible Robot

Replies are listed 'Best First'.
Re: For Review: DB Import
by fokat (Deacon) on Jan 25, 2002 at 06:24 UTC
    From your post, I don't think this is the kind of feedback you want, but here it goes. I don't think you have anything wrong in your coding style. In fact, I think you're trying to optimize for mantainability, which cannot be bad.

    You're achieving this goal with much more success than what I've seen from many professional programmers, with formal computer science education.

    This said, I would suggest you to look into POD for providing your documentation. POD can be stored side by side with your script, so it is very convenient. You can change the code and the POD at the same time. In fact, when I write modules, I tend to write the POD before writing the first line of code. This allows me to think thoroughly in the interface as well as the inner workings of what I need.

    If you do your documentation with POD, you can output a very simple usage message or the whole docs with Pod::Usage at your nearest CPAN mirror.

    In the efficiency side, I guess this is a very simple example of a data loader. The burden of the work is made by the DB backend and the loader's load is minimal. The only suggestion I could make with regard to this, which does not apply to this specific example, is to only ->prepare the SQL statements that you're actually going to use. It's a waste of resources to do otherwise.

    As for the OOP side, OOP is simply a methodology to design your code. Your example, IMHO, is not a nice candidate for generating an OOP solution, as I doubt a significant fraction of it is reusable. You should consider OOP when facing code that can potentially be reused (by you or by others). I find it easier to break down the code in abstract units, not necesarilly tied to the main task you're implementing.

      Thanks, fokat. I had considered Pod::Usage, but thought it was overkill for the size of this program and my expectations of how the docs would be used. Maybe I should take another look at it. (I always POD my modules, but seldom POD my apps.)

      I agree that this example is not natural candidate for OOP. That's why I thought it might help me to get inside the OOP mindset, by seeing an OOP approach in a case where a more straight-forward method seems more natural. :-)

      Impossible Robot
Re: For Review: DB Import
by mandog (Curate) on Jan 25, 2002 at 08:54 UTC

    Some very, very minor points.

    You might put the definition of  $username, $password, $dsn at the very top of the script. This will make it easier to make simple modifications quickly. Something like:

    use strict; use DBI; use Getopt::Std; my $dsn = "DBI:mysql:$database:localhost"; my $username = 'username'; my $passwd = unpack("u", $enc_passwd); ######### No User servicable parts below ############
    Also, I'm not sure what the gain is with the obscured password. It isn't too hard to say:
    print unpack("u", $enc_passwd);


    email: mandog
      Thanks, mandog. I originally put my config variables at the top of my script. I added my usage statement at the top to give a POD-like overview to anyone reading the code (which of course begs the question of why I didn't use POD itself). I agree that this tends to bury the config vars pretty deep, and I like the suggestion someone else made about using constants to correct it.

      The obscured password is my equivalent of the asterisks that appear when you are typing in a password field. (Sure, someone can pay close attention to what I'm typing, but it prevents people subconsciously picking it up while standing over my shoulder.) No one who should not have access to the password should have access to the script itself.

      Impossible Robot
Re: For Review: DB Import
by spaz (Pilgrim) on Jan 25, 2002 at 06:05 UTC
    Hmmm...I like it.

    Your comments just say what the code is doing. I always like to include why.

    Little comments (like at the end when you disconnect from the database) just seem superflous when there's no documentation for your while( <INPUT> ) loop at all. Dont get me wrong though! Comments are a Good Thing (tm)

    I think that s/\"//g; can be re-written (more efficently?) using tr

    -- Dave
      Thanks, spaz. I will look over the comments and see if I can emphasize the why.

      Many of the what comments are a result of my coding style. I usually tend to write the entire program as comments, then fill in the actual code. I probably knocked out some of the original comments in the process of restructuring.

      I agree that tr/// would be better than s///. (I'm not sure why I escaped the quote in the first place!)

      Impossible Robot
Re: For Review: DB Import
by Ryszard (Priest) on Jan 25, 2002 at 15:55 UTC
    This looks like a good bit of code.

    I would put in the OS respose at the open file line with $!, even tho' it may not be appropriate considering the user of the script, if there are any errors they can give back the exact details under $!.

    Using a rather personal style (if perf isnt an issue) I like to stick my SQL into an OS file that I suck in. I like to do this because I can edit the sql directly, meaning I cant accidently modify the supporting code.

    Another personal thing, I would perhaps issuse a challenge for the database authen, rather than store the pwd. It doesnt take a rocket scientist for a novice to use google, learn a few perly bits, and get back the pwd they could use in another app. If you must store the password, why not use a two way encryption engine (such as DES or 3DES) and store the key in a protected file somewhere. Not really what I consider brutely secure, but better than effectively keeping it in the clear.

    You dont do any decent checking of the header/booth vars. I am not familiar with MySQL (nor an expert on the SQL 92 std) , however i would think it would lend itself to the old subquery trick where the column to be inserted is the result of a (sub)query itself. (also if a user accidently enters a ' (single quote) it will bugger up the query (well, it will with postgres and oracle))

    If $booth_table doesnt change thru'out the code, why not:

  • use a constant
  • hardcode it?

    Personally, I like the constant, but it depends on your style.

    OK, so i'm being incredilby nit-picky, and anal retentive about it all. Above all, it seems like a good bit of code, easy to read and easy to maintain. Most of my suggestions are personal preference and could be just plainly be avoided.

    One thing I never avoid in my code that will be used by untrusted 3rd parties is logging. Grab an IP, or provide authen to use the script, but log every untrusted 3rd party that may use your script. It definately a CYA measure, but can solve disputes, uncover problems you may have with security/perf et al. IMHO very important.

    Another thing I would not avoid is checking the return of any DBI execute statements and handling the errors accordingly (ie to fit your context).

    Overall a good bit of code, perhaps an 7..8/10 by my standards, well done (++).

    Update:Thanks to Zaxo who pointed out that raiseerror is being called. I did notice that, however I meant some explicit error handling (altho it may not be needed to this type of app)

    Update2: In answer to impossiblerobot when i write apps like this and there is an OS error I usally dump some text letting the user know there is an error and ask them to cut and paste the output of $! into an email...
      Thanks, Ryszard. I agree at including $! could be more useful, even to an end-user. (I'm curious as to how most monks go about combining a user-friendly error message with the sometimes less friendly $! version.)

      Although I don't think this is a good application for putting the SQL in a separate file, it is definitely something I will think about for future projects.

      The obscured password is not for strict security; it's to prevent someone who looks over my shoulder, or to whom I show the code, from having the password burned into their brain accidentally.

      I like your idea of using a constant for variables like $booth_table that don't change (though I understand that perl optimizes for cases like this anyway, and it does make interpolation more complex). And I think logging would be perfect for this particular application, which often results in large changes in data.

      I think you're also dead-on about catching my DBI errors. Although I'm using RaiseError, it could use more user-friendly error-reporting/handling.

      Impossible Robot
Re: For Review: DB Import
by Anonymous Monk on Jan 25, 2002 at 20:58 UTC
    Without actually reading the last line,

    I would also be curious to hear how this might be tranformed into an OOP style

    I thought that "hmmm, perhaps it would be good to investigate some and perhaps use OO instead." :)

    I have no formal education either, but I still might be able to give some constructive and useful critique.

    Anyway, back to the issue.

    Initially I'd like to say that first I though "Oh no, another do-it-all-in-a-row." Of course things happen after each other, but that doesn't mean that the code has to be in the same order. I'd prefer to see some subroutines, even if they were just procedural. First of all, it enables encapsulation and reduces the noise so you can concentrate on what's happening right there. For example, the argument checking code. That would easily be abstracted to a subroutine. That would make it clearer what the program really does. (For maintainability.)

    If you don't like that style, or that style seems inappropriate every now and then (I haven't looked that closely at the code), don't be afraid of adding extra blocks to create scoping tricks. This way you know where a variable isn't used anymore, and you're free to manipulate code without fearing far-fetched side-effects.

    I've noticed that C programmers that program Perl often declare variables that contain "tmp" in them. And they're declared within a lexical scope that is much larger than the scope where the temporary variable is used. I use temporary variables too, of course, but I often add an extra block around them. Scoping is very important for maintainability, at least that's my belief. I've only had to maintain my own code, but that's enough. "Think about the guy that comes after you. After six month you'll be that guy." For instance, when I glanced through your code, I saw that @insert_fields was declared at the very top. Then I found code that uses that variable -- but that was 93 line below. So I started to look for other places where it was used that would justify the early declaration. (I failed.)

    So basically my main objection to your code is that it's too accumulative. (Or at seems it looks that way.)

    Except that, I think it's pretty decent code. And to modulize the code is probably not worth the effort, since it's an application, not a general thingy/programming interface.

    And oh, I found a little piece of actual code to comment on. Yay! ;)
    130: my $placeholders = join(',', map {'?'} @insert_fields); First I thought, "why use a block when a simple statement would do?". I.e. map '?', @insert_fields. Then I suddenly realized the misuse of map(). The x operator would be perfect here.
    my $placeholders = join(',', ('?') x @insert_fields); Hope this was helpful.
      Thanks. I agree that most of your comments don't really apply to this particular application. What little exposure to formal Computer Science methodology was in Structured Programming, but I have trouble convincing myself to write subroutines that will only be used once in a specific order. :-)

      (Though I'm sure there is some room for increased modularity. Any specific suggestions?)

      You're absolutely right that I should have used an expression rather than a block in my 'map' statement, and I would never have thought of constructing my list using the 'x' operator (though I had originally tried to use it to construct the $placeholders string). I will definitely incorporate that change.

      Impossible Robot