http://qs1969.pair.com?node_id=108286

Notice: my intention in this post is not to start a PHP versus Perl flame war. This post is about good coding practices being applicable to all languages, not about any particular languages weaknesses. Further, for those who wish to flame me about the "Perl problems" that I mention, this thread isn't the place. Heck, those "problems" could simply be personal prejudices. However, the point of this node is not whether or not Perl is perfect; it's about good coding practices.

The Problem

I've recently been learning PHP for a Web site that I need to maintain when I discovered something curious about the language: you can't predeclare variables. In fact, anyone can create a global variable (with any data they want in it) in your code simply by insert an appropriately names form element in the HTML document that has the data they want. There does not appear to be a PHP equivalent of 'use strict'.

My initial thought was to write a Perl script that validates my PHP code and warns me when I have misspelled a variable, used it only once, etc. It was irritating to me that an interesting tool like PHP has such a glaring violation of good coding practice that I started thinking about this a bit differently. I've programmed in quite a few languages and noticed that all of them have problems. Here's a brief sample:

COBOL
All variables are global.
No references.
VBScript
Horrible problem with silently mangling dates.
Case-insensitive (maybe that's just a personal gripe).
PHP
Can't predeclare variables to catch typos.
Variable variables seem to be encouraged in the documentation.
Perl (didn't think I'd leave it out, did ya?)
Excessive use of globals built in to the language.
OO is kludgy and slow.
Perl's prototypes have some significant issues.

Many have seen that newer programmers often fail to use strict, warnings, taint checking, or many other good programming practices that are suggested to them. I'm here to say you're not only hurting yourself; you're hurting anyone who has to maintain your code. The interesting thing about these programming practices is that they are not Perl-specific. In fact, there are few, if any, languages where these programming practices don't apply.

Why good programming practices are good

Predeclared variables

Let's start off with 'use strict'. Use strict affects variables, references, and subroutines. For the sake of brevity, I'll just cover variables.

Let's face it, when you have a 2000 line program and buried in that program, somewhere, is a variable mis-named %quarterly_reciepts, it's not an easy issue to figure out. Finding a misspelled variable name is a snap when you predeclare variables, but if don't, you may have no idea that your code is spitting out bad output because of a misspelling. You might spend time figuring out if you're reading from your database correctly or wondering if you have a file buffering problem. Why wonder whether or not you've misspelled a variable when you can trap that issue in a couple of seconds and potentially save many, many hours of debugging? I guarantee that programmers coming behind you may not thank you for using strict, but they will curse you if you don't.

Perl has 'use strict' to protect against undeclared variables. VBScript has 'Option Explicit'. Even venerable COBOL has 'Working-Storage' to deal with these issues. If this feature is optional in your language of choice, turn that option on!

Global variables

So, you've written your first module. In fact, you've written an entire suite of modules that share data amongst themselves the programs that use them. Knowing that laziness is a virtue (a false virtue, in this case), you decide to use global variables for some data that everything uses. Here are potential problems with this (some of these are general issues, others are Perl-specific):

  • Months later, when you or someone else comes back to maintain the code, the first question that gets asked is "where the heck is $main::incr set to 5?"
  • In your suite of programs, you have a little bug that munges that global variable and the rest of the code breaks. Hmm... wonder what changed it. Good luck finding out.
  • You want to port the code to mod_perl. Too bad. You no longer use the %main:: namespace.
  • 'use strict' doesn't catch problems with misspelled globals unless you declare them with "our" or "use vars". Many programmers don't understand how those work.
  • Later, a maintenance programmer who works on your code is going to have to try and remember what all of the globals are for. With lexically scoped variables, this is much easier to do.

Modular/orthogonal code

Each piece of code should do one thing and do it well. I think one of the most famous Perl examples of violating this principle is the following misguided attempt to parse form variables.

foreach $pair (@pairs) { ($key, $value) = split (/=/, $pair); # Convert plusses to spaces $key =~ tr/+/ /; # Convert Hex values to ASCII $key =~ s/%([a-fA-F0-9] [a-fA-F0-9])/pack("C", hex($1))/eg; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9] [a-fA-F0-9])/pack("C", hex($1))/eg; # Eliminiate SSI's $value =~s/<!--(.|\n)*-->//g; # If we already have a key with this name, allow for # multiple values!!! if ($formdata{$key}) { $formdata{$key} .= ", $value"; } else { $formdata{$key} = $value; } }

See the line that tries to eliminate server side includes ($value =~s/<!--(.|\n)*-->//g;)? Aside from the fact that it's a terribly written regular expression, it also will cut out a lot of HTML comments (in fact, it will pretty much destroy an HTML document if it has more than one comment in it). What happens when you want to include HTML? You have to rewrite this routine, which could cause problems if other code relies on it. A form-parsing routine should parse the form data, that's all. If you want to strip anything out of that data, do it elsewhere.

Code that doesn't have side effects is known as 'orthogonal' code. For example, if you step on the brakes in your car, you don't want it to veer to the left. If you turn on your headlights, you don't want that to automatically trigger your windshield wipers. If you are validating a username and password, don't go out and grab the CNN headlines in the same routine.

Check your system calls

We've all seen it:

open DATA, $data;

If you failed to open the file, your code continues to silently run. If this is embedded in a large system, this could take a long time to track down. Sure, adding the "or die: $!" is more work, but the extra cost of fire insurance is a blessing when your house burns down.

Validation

Many newer programmers fail to realize that something is going to go wrong with their code. Maybe the user types a letter instead of the numbers you have on your menu choices. Maybe a function returns an array instead of a reference to one. Maybe, gasp, someone with malicious intent is trying to break your code (hopefully, they're your testing department).

Sometimes, you may think that validating your data is a waste of time. I remember one time that I was writing a program that would summarize commission data and the programmer who wrote the system that I was working on asked me why it was taking so long. I showed her my code and it had gobs of input validation. As it turns out, she had written a wrapper for this system which validated all data long before it got to me. In theory, I could have dispensed with my validation. However, what happens if the input data for the system changes and someone needs to rewrite that wrapper? We all know how easy it is to write buggy code and there's no guarantee that nice, clean data that enters my program today will be clean tomorrow. Remember, you're sleeping with every program that your program ever slept with (okay, that was a rotten analogy).

One of the beautiful things about strong data validation is that you control the error messages. Rather than having a program die a horrible death when it tries to divide by zero, you've already trapped that undeclared variable and have a nice, useful message in the error log.

Factor out common elements

Do you ever find yourself rewriting the same snippet of code? Have you ever had to do a global search and replace on a program? The odds are, you have duplicated something that you should have factored out. Here's a beautiful Javascript example our design department turned out:

function changeLoc(formNum) { if ( document.forms[0].elements[formNum].options[document.forms[0] +.elements[formNum].selectedIndex].value == "Corporate Home" ) { parent.location.href = "http://www.somesite.com/"; } else if ( document.forms[0].elements[formNum].options[document.f +orms[0].elements[formNum].selectedIndex].value != "nogo" || page != " +") { top.i3.location.href = document.forms[0].elements[formNum].opt +ions[document.forms[0].elements[formNum].selectedIndex].value; } document.forms[0].elements[formNum].selectedIndex = 0; }

Ooh, that's miserable. After factoring out the appropriate form value:

function changeLoc(formNum) { page = document.forms[0].elements[formNum].options[document.forms[ +0].elements[formNum].selectedIndex].value; if ( page == "Corporate Home" ) { parent.location.href = "http://www.somesite.com/"; } else if ( page != "nogo" || page != "") { top.i3.location.href = page; } document.forms[0].elements[formNum].selectedIndex = 0; }

Much better. Now, if we need to tweak the page value at all, we only do it in one place.

For Perl, here's an example from a module I wrote recently (simplified for clarity):

sub update_foo { my ( $self, $data ) = @_; my $id = $data->{ textID }; delete $data->{ textID }; if ( $id !~ /^\d+$/ ) { croak "textID '$id' in update_foo must be numeric."; } my ( $field_values, $values ) = $self->_format_update_data( $data +); my $sql = "UPDATE giText SET $field_values WHERE textID = ?"; push @$values, $id; my $return = $self->_update_database( $sql, $values ); $self->{ _dbh }->commit if ! $self->{ _error }; return $return; }

After rewriting this routine for the third time, I realized that the only thing changing was my ID and the table name. Needless to say, that quickly changed. Now, my "update" methods only validate the ID and supply the correct table name. They are then passed to a generic update method. If I ever need to update that, I only have one place to do it instead of three.

Summary

The examples that I gave above were mostly focused on Perl. I did that because this is a Perl-related site and some of the monks who read this may only know Perl. However, the principles are not restricted to Perl. Hence the title 'use strict' is not Perl.

One of the things that really surprised me after I started learning about how to write code well is that I could often judge code quality of languages that I had never used. When I first started learning PHP, I could easily spot rotten programs. I don't know JavaScript well, but I'm constantly cleaning up our design department's Javascript, despite the fact they know it much better than I. Good coding is not language specific.

Whether you are a brand-new programmer or a seasoned veteran, these principles will apply to virtually any programming language you use. Sure, you can't predeclare variables in PHP and COBOL only uses global variables, but that doesn't invalidate the other principles. If you get in the habit of spending a little time up front learning these things, you will be well-rewarded by writing better, tighter code that is much easier to maintain and has fewer bugs.

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: 'use strict' is not Perl
by mdillon (Priest) on Aug 28, 2001 at 03:18 UTC

    the PHP equivalent of 'use strict' (at least with respect to variable declaration) is setting the 'register_globals' configuration option to false. in PHP 4.1, the default for this setting in the distributed php.ini files will change from true to false. i don't know if the PHP community will adopt this recommendation, or continue on with potentially broken and insecure code.

    also, using the E_NOTICE warning level will get error reporting similar to -w for Perl (variables used before they have an assigned value, potential precedence problems, etc.).

      the PHP equivalent of 'use strict' (at least with respect to variable declaration) is setting the 'register_globals' configuration option to false.

      Great steaming hairy bollocks! All that does is stop variables that are passed by GET or POST from being declared as globals.

        EXACTLY, thus not polluting everything with variables from all over the damn place. The goal (strict) is the same, even though the web centric context of PHP is different.

        It's going to be hard for them to get out of their years of bad habits. Maybe PHP 5 will solve some of this, but I doubt it.

        ()-()
         \"/
          `                                                     
        
Re: 'use strict' is not Perl
by patgas (Friar) on Aug 28, 2001 at 17:10 UTC

    I don't know if any other monks agree with me, but the title for this node sounds a lot more controversial than it should. My first reaction was "What?!? And it's by Ovid?!?" I'd suggest changing the title to something like "'use strict' is not just for Perl".

    But this is great information, and reminds me how much Perl (and this site) has influenced my other programming. My main job is to write ASP applications, and I admit I hardly ever turned on 'Option Explicit' or declared my variables until I started reading PM, and realized how horribly wrong that was. I'm still a while off from writing flawless code, but it's getting a lot better.

    "We're experiencing some Godzilla-related turbulence..."

      Since you're not the first to mention that title to me, I guess I should explain.

      I gave a lot of thought to the title and wondered if what I was naming it was appropriate. I finally decided that it was for a couple of reasons. The first is simple. 'use strict' isn't just Perl. It's a good programming concept that applies everywhere. It's only in Perl that we happen to use the words 'use strict' to describe and implement it.

      The second reason was a bit more sneaky. If I may be less than humble for a moment, I have to say that I felt like I was presenting some worthwhile material. Since I wanted monks to read it, I deliberately chose a provocative title. Of course, you see that a lot in my posts (Death to dot star!, use CGI or die;, etc). While I didn't want to deliberately sow confusion, I felt it was better to try to get people to read this -- and hopefully take it to heart.

      Cheers,
      Ovid

      Vote for paco!

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Heh. I /msg'ed Ovid about that very thing maybe 5 minutes after the post went up. You even got the title I suggested exactly! :)

      ------
      /me wants to be the brightest bulb in the chandelier!

      Vote paco for President!

Re: 'use strict' is not Perl
by ducky (Scribe) on Aug 30, 2001 at 01:19 UTC

    ...nor should it be turned off by default.

    I, for one, am glad -w and use strict, or the functionality at least, will be default in perl 6. It's such a spinal macro typing the perl shebang including -w, two enters and "use strict ;" I sometimes forget to really stress the importance to newcomers to perl that I'm mentoring.

    Building on that note, I recently took another inductee under my wing - last night in fact. He had a strong desire to learn Perl but knew he really needed a project to use it on. One of his friends suggested he make his web site a little more dynamic and use PHP to ease him into Perl. PHP! That's like telling someone to use ol' MS-DOS so they'd be ready for Unix! So I gave him some space on my server with HTML::Mason. =)

    The upshot of the above is now he's got a fun project to work on, it's getting him comfortable with Perl at his own pace, and HTML::Mason defaults to use strict with warnings on! Hahaha, I almost feel naughty about it =)

    -Ducky

      ...nor should it be turned off by default.

      That would make one-liners very awkward.

        Which is why it'll be off by default for command-line programs. (I don't want to declare variables for one liners either, so...)
Re: 'use strict' is not Perl
by mischief (Hermit) on Oct 15, 2001 at 21:22 UTC

    One thing that hasn't been pointed out yet is in PHP, variables are private by default inside functions; you have to explicitly declare them as global for the rest of the program to access them. This means that you're effectively "using (a small part of) strict" inside functions. A small point, but I thought I'd post it for the sake of completeness.

Minor defense of VBScript's date handling
by dze27 (Pilgrim) on Aug 28, 2001 at 21:19 UTC

    A little offtopic, but how does VBScript mangle dates silently? I use VBScript regularly (I'm afraid perl is verboten for our production web site, I just use it for behind-the-scenes scripting) and I don't recall many surprises. The odd problem I've had has come when specifying a 6-digit date, if that's what you mean. Sometimes, depending on the locale setting on the server it may do something unanticipated to a yy-mm-dd or mm-dd-yy or dd-mm-yy date. Of course now we're in 2001 the two-digit year could be mistaken for a month or day. What I do is to always use yyyy-mm-dd format when specifying a date and let VB's internal format take care of everything. This way you don't run into any problems. Then use DateDiff, DateAdd, Year, Month, Day etc. to get all of what you need. Maybe I'm missing something, but I've found VBScript's date handling very convenient.

      If your date format is the american style mm-dd-yy and you pass VB a string with mm > 12 it will not complain like it should. It just silently assumes you mean dd-mm-yy.