The stupid question is the question not asked | |
PerlMonks |
Tutorial: Introduction to Object-Oriented Programmingby jreades (Friar) |
on Dec 10, 2002 at 13:07 UTC ( [id://218778]=perlmeditation: print w/replies, xml ) | Need Help?? |
Object-Oriented TutorialPrerequisitesThe following knowledge will generally be assumed:
IntroductionI was trying to avoid having to write this section, since the philosophy that I wanted to capture with this document was teaching by doing, rather than by throwing a lot of terms at you and then telling you what I meant after you were thoroughly confused and intimidated. However rob_au very rightly pointed out that I needed some sort of introduction to the idea behind object-oriented coding in order for anything that I was saying to make sense. So here goes... At its (simplified) heart, OO programming is about creating discrete packets or groupings of data (what we call objects) that model some 'thing' in the application space. So, an employee application might use an Employee object to capture basic pieces of data about each employee -- their age, their Social Security or National Insurance Number, their job title, and so on -- while a zoo's appliation might use an Animal object with the relevant pieces of data needed to keep the animals alive and health -- their dietary needs, whether they are dangerous to humans, the number of legs, and so forth. So rather than having to look up each piece of information one bit at a time, one piece of an application might hand over an employee object to another piece of the application, and the second component can just ask the employee object for the information it needs without having to know anything about where this data came from (a database, a flat file, a pipe, etc.) or how it was stored (comma-delimitted, tab-delimitted, database rows). In fact, almost anything can be turned into an object if you look at it the right way. The question you always need to keep in the back of your mind is "Is this something that needs to be an object in order to improve either functionality or re-usability?" Because OO is a lot more work than a straight procedural script, but it can also be much more powerful. Another way of looking at the difference between procedural scripts and object-oriented applications is that the former tend to focus on what you could call the verbs of a sentence: the user submits a form, the form is validated by a subroutine, and then inserted into a database. OO, on the other hand, looks more at the nouns: the user submits a form, the form is validated, and then saved to a database. The sentence is almost the same, but it reads very differently, and this reflects the real world of OO vs. procedural: often it is as much a judgement call as anything else when an application should be expressed as a set of objects rather than a set of scripts. Obviously there's a lot more to in than that, but hopefully this is enough to get you oriented. To Begin: An ExampleA lot of OO tutorials take the approach of explaining the terminology, and then introducing some examples to help you make sense of it all. I'm going to try the exact opposite -- using an example that will (with luck) make sense as a way of introducing the terminology. The ProjectI'm going to take as a starting point this node since it offers a good way to come to grips with when it might make sense to switch to OO vs. remaining in a procedural mind set. It also reveals a little of why OO is so damned hard. So our fictional project is going to be to set up a system to handle user-submitted quotes on a Web site -- users should be able to submit new quotes, have them reviewed by an Administrator (for rudeness or duplication) and then see them show up on the Web site at a later date. Before we even start coding, we should probably jot down some ideas about how a quote would work. Here are the things that come to mind for me... All quotes would have:
Of course, there are all kind of problems with this first cut -- for instance, some quotes might not have a date, or maybe the date is extremely specific (June 26th, 1963 for JFK's "Ich bin ein Berliner") -- but we're going to keep it simple and just use an author, a phrase, and an approval. A Few Prototype functionsI'm assuming that you are familiar with subroutines and most of the basics of Perl programming, so you know why we'd want to look at capturing some of the ideas above as subroutines. So here's some basic code:
Right now there's not a lot of utility in this script -- why would anyone test a local variable using a subroutine instead of just writing my $phrase = "Foo";? ModulesMany of you are no doubt already familiar with modules, any time you've put a use at the top of your script you've imported a module for use in your script. You may well have written a few yourself to improve the re-usability of some handy tools that you developed along the way. Let's say that I expect my one quote to be used in many places and that I want to modularize it in a quasi-useful way. I might change the file above to look like this:
Now, my script would look like this:
This prints:
I have now successfully isolated my quote from the script that calls it. I could have two variables named $quote_ref (one in the script, one in the module) and they would never come into conflict with each other since one is in the main namespace, and the other is off in the Quote namespace. I could ask for this quote in other scripts and get all of this (dubious) utility over there as well. If I update my quote, then I don't need to waste my time updating every script that uses this module. I now have a working module, but it's not really terribly useful since I'd have to create a new module (if I stuck to this system) for each quote. I can just see the fun of managing Quote::People::JFK::Berlin, Quote::Movies::MontyPython, etc. Ideally, my Quote module would be some kind of 'super-quote' and not only would it have a way to hold the data (the quote, its author, and so forth) for many different quotes, but it would also give me some handy subroutines to do useful things that are quote-related. In other words, what I really want is an abstract representation of all quotes whether they are funny, political, rude, or profound -- what I really want a class. ClassesThe easiest way to think about a class is to think of it as a prototype, in code, for all quotes -- all of the variables and functions are there, they are just waiting to be filled in at runtime by a specific quote. Going back to my list above, I can see many of the handy things that I would want my quote class to have. And looking at the code above, I can see some of how I want that to work since, at it's core, a class is very much like a module with a few special features. Here's what the Quote class might look like:
That might now look thoroughly confusing, but it's helpful to compare it against our previous Quote package to see what's changed. There are really only a few key points:
But let's turn back to our script (the one that uses Quote.pm) to see how it has changed before I try to explain this any more.
This prints out:
There, now we've seen an object in action, but what happened? BlessingFirst, you can see that I created a new Quote object using the syntax Quote->new() and assigned it to a variable called $quote. I then called my subroutines (set_phrase, get_phrase and so on) using $quote as a kind of handle -- I no longer need to say Quote::get_phrase(), I just say $quote->get_phrase() and Perl knows which subroutine (or method as they are known in OO) to run. Of course, in Perl there is always more than one way to do it (TMTOWTDI), and I should note that instead of saying Quote->new() I could also have written new Quote and left the rest of the script exactly as-is. It's mostly a matter of style, however the preferred style is Quote->new() as it removes any ambiguity that can confuse either the reader/user or the compiler. To understand a little more about what is happening, let's make a few changes to your Quote.pm file so that you new subroutine now looks like:
Run the script again, and you'll see:
Examining the sequence here, we can see that bless is somehow taking our $self hashref (whose nature we confirmed by printing out ref($self) and seeing HASH) and changing it to an object whose type is Quote (which we confirmed by printing out ref($slef) right after calling bless). So, it seems that blessing is the means by which a hash reference (or any other type of reference) is promoted to an object of a particular class. MethodsThe next thing to look turn back to is our subroutines in Quote. Why the my $self = shift;. First off, let's now introduce the right terminology -- a script has subroutines, an object or class has methods. There's no other real difference between the two -- a method is simply a subroutine associated with a particular class. It also turns out that part of way objects work in Perl is that the first argument to a method is always an object reference. If you look back to our very first code we were doing much the same thing, except that there we had to be explicit about handing in the hash ref as the first argument of the subroutine. With objects, it's much the same, except that the reference is simply understood. Try this:
You'll see the following:
Notice that the reference we have here is to the exact same hash ref that we created in our new subroutine right down to the memory address. So it follows that: a class' methods are really just subroutines whose first parameter is always a reference to the object of which they are a part. And there is a second, important consequence of this: calling one object's methods shouldn't return another object's data.
It will produce:
So what does it all mean?You have now created and used several instances of the Quote class, but through all of this I'm sure that the benefits of "going OO" haven't been very obvious -- you've done a lot of work just to get to a point that would have taken you five minutes in a regular procedural script. There are several answers:
Once again, let's turn to a concrete example: say that you have decided a couple of things about how quotes will work:
Let's take a look at how our Quote class might change (our class is getting long, so I'm just going to show the methods that changed or are brand new):
Notice that we are doing several useful things:
Now think about how your script can take advantage of these features:
Change the subroutines again as follows:
Our validation rules have changed substantially, but no code changes are required in the scripts -- they just carry on asking our Quote object are you a valid quote?. Persistence, Inheritance, and Abstract Classes, Oh MySo now let's turn to saving our Quotes and try taking an OO approach here too:
A lot of new concepts are being thrown at you here, and I'll address them in a moment below, but I first wanted to point out that I have created a file with two packages in it: Saver, and Saver::File. The script has now been updated to read:
Here are the key concepts:
Here's the output of the script:
Interesting, no? The file is closed 'after' the script exits. This is the DESTROY method at work -- Perl waits until it's sure that I'm not going to do anything else with my Saver object and then automatically calls $object->DESTROY(). If I hadn't specified my own DESTROY method, Perl would just have attempted to destroy the references and reclaim any spare memory, but my special DESTROY method tells Perl to close the file handle before allowing the object to be destroyed. This raises an important point about object-oriented code -- always keep in mind that objects are hard to destroy since Perl often has trouble knowing when you're done using them. If you're in the habit of keeping a lot of references lying around on the assumption that they are just pointers and use very little memory, you're going to find your OO Perl slowly eating its way through your system's memory. Aside from good reference hygiene, another way to manage this potential memory usage is to create re-usable objects by giving them, for instance, a reset() method that resets the state of the object and makes it ready for a new Quote/File/what have you. This technique isn't applicable in every situation, but where it can be particularly useful is where you have the potential for a lot of objects to be created (which is a comparatively expensive operation) and speed is of paramount importance. Drawing on my own job experience, I do a lot of ETL (Extract, Transform, and Load) work where I routinely handle plain-text files with over 2.5 million unique records. At this size, it not only becomes prohibitive from a memory standpoint to keep so many objects hanging around, but the overhead needed to create and destroy an object for each record (or each field within a record) becomes astronomical. Instead, we create a single object and constantly re-use it via a method that essentially resets the object to a pristine state. This is much faster and saves a lot of memory. The next important concept in the code above is the idea of inheritance -- Saver::File inherits from Saver. Inheritance is specified using the @ISA array, so the line: @Saver::File::ISA = qw(Saver); tells us that Saver::File is a Saver (ooooh, a Perl mnemonic!). But what does inheritance do? In the example above, inheritance doesn't give us a great deal (although I'll touch on some interesting side effects in the section on abstract classes), but it does mean that if we were to give Saver some useful methods then Saver::File would automatically inherit them. Let's try an example by adding the following to the Saver package:
Then, in Saver::File we're going to change our save method to read:
And our script would be changed to:
Inheritance is what allows me to call $object->serialize() in the Saver::File class (where no such method exists) and have it run the appropriate method in the Saver super-class (where it does). Please note that what I have done is actually bad OO technique for the following reason: the Saver class now needs to know about the methods of the Quote class in order to do its job. If I were to change get_phrase to be get_quote in the Quote class, then I'd have also broken the Saver class as well. What I did was for illustrative purposes only. The other important thing that I've done is to create what's called an abstract class -- a class that is never supposed to be directly used. Notice that you can't write my $saver = new Saver; but you can write my $saver = Saver::File->new($file);. The concept of the abstract class is quite advanced, but it's very powerful. Essentially I am creating a class that exposes (allows the user to call) one or more key methods, but only as placeholders. The methods of the abstract class don't actually do anything useful. If you're wondering why on earth anyone would create a class that has methods that don't work, then you're probably not alone. The abstract class is intimately bound up with the ideas of casting and of inheritance (which is why I talked about inheritance first). To use an analogy, when you cast a sculpture in metal, you take an object in one form (say a bronze block) and pour it into a caste so that it assumes another shape. The bronze is still the same, but it certainly looks different. In a similar way, I can cast a Saver::File object as a Saver object because Saver::File inherits from Saver. What this means is that at the same time as my Saver::File object retains all of the functionality of the Saver::File class, it can also be used anywhere that a Saver object is called for. In other words, it looks, to any script that wants a Saver object, exactly like a Saver object. So if I don't care about where something is being saved, then I can just call $object->save() and let the class worry about the details. This is (sort of) the idea of encapsulation -- if my script doesn't need to know how something is saved, then that functionality (and any underlying data related to that functionality) should be hidden from the outside world (encapsulated). In fact, the idea of encapsulation says a lot more, and it states that I should never try to access another object's data directly but should always use the methods provided. Since the objects that we are using are really just anonymous hashes, I could very easily reach in and say $quote->{phrase} = "Foo" rather than writing $quote->set_phrase("Foo"). However, not only is this bad manners, but it's also bad OO since not only are you making an assumption about how I've written my object, you are also bypassing all of the validation rules that I wrote into my class. Anyway, back to the purpose of our abstract class -- let's see what would happen if we decided to move the Quote file into a database (so that we don't have to read in the entire file every time). I could create my Database saver module like so:
Now there is some horrible DBI code in there (I don't have a db installed on the machine that I'm writing this on, and my DBI is very rusty), but that's not what I'm trying to get at. The key thing to gain from this additional class is the following: in order to switch my code from using a flat file to using a database I have to change exactly one line. This:
becomes this:
And the rest of my code rolls on exactly as it did before even though I have completely changed the underlying architecture. That is quite a powerful technique to add to your toolset and is one of the main reasons that developers get really excited when they start talking about being able to make something object-oriented. Notice too that Saver::Database can also use the serialize method defined in the Saver super-class. Another little tidbit that I'd like to point out is the use of what's called a class variable. The variable $sql (defined by the line our $sql = "...") is shared by every object of class Saver::Database. This means that a change made to the value of $sql in one object would be visible to every other object. Class variables are often used to define things that are either immutable (will not normally be changed by objects -- as is the case with our SQL statement) or that need to be used as counters (how many objects of class "Foo" have I created?). There are other uses for class variables as well, but those are the simplest. Further ReadingThere's a great deal more to OO than what I've outlined here (and some even more powerful concepts that I haven't delved into), but this should give you a good framework for getting from I know OO is important but I don't really know much about it to I can create and use simple classes. From here, I'd recommend that you read the other tutorials on OO design and programming in Perl that are available on the Web since they are more rigorously correct than mine and look at more advanced concepts. Here are a few links:
A Brief Discussion of StyleSeveral astute people have picked up on my obvious Java pedigree -- I learned OO from Perl, but spent a couple of years as a Java developer where I picked up some rather non-Perlish habits rooted in a strongly-typed language. For the sake of comprehensiveness, I'd like to cover a couple of points where my style differs significantly from that of your standard Perl OO developer. Java methods are structured so that you can't call a method and decide, pretty much on the fly, what it will do. For this reason I got into the habit of using what are known as getter and setter methods (get_phrase, set_author, and so on). A lot of Perl programmers prefer to write their simple attribute accessor methods the following way:
This is quite tricky coding-wise, and I'll do my best to make it intelligible since it needs some unpacking... First, we remove the object reference from the subroutine's @_ array using shift. Anything left is assumed to be part of the arguments being passed to the subroutine/method. Next, we test @_ by implicitly calling it in a scalar context -- if it's empty, it will return 0 (false), if it has one or more elements it will return > 0 (true). In this simple case, we are assuming that there is only a single element remaining in @_, so if @_ ? returns true, we shift the remaining element out of @_ and assign it to $self->{phrase}. If there is nothing in @_, we just call $self->{phrase}. This has the fortunately side-effect of making the return value from our subroutine the value of $self->{phrase} and this is how $object->phrase() can act as both a getter and a setter method. This is obviously quite a nice little shortcut, but IMO it has a couple of problems:
But this is my personal opinion, and only you can decide whether you prefer the faster, more Perl-like style, or the more rigorous, less ambiguous Java-like style. And of coures, if you were feeling particularly generous you could offer the programmer both styles and then let them chose the style with which they are most comfortable. TMTOWTDI. Another Java habit that I've picked up (and this is again related to the getter/setter methodology) is my general avoidance of hashes and arrays as ways of setting multiple parameters simultaneously. I tend to feel that if you are setting two attributes in a method then you should do so explicitly with two scalars. However, there's absolutely nothing stopping you from creating a method that follows the following style:
Again, it's perfectly valid Perl and many people like to write things this way. And perhaps more seriously, several people have suggested that my reluctance to use die() is just plain wrong (see Bad Lessons section below). They are, for the most part, right. When I last did a lot of Perl OO work (this piece is something of a refresher for me) I wasn't very comfortable with the following syntax:
I'm still not very comfortable with it, but at least I understand it now (of course, I may have got the syntax completely wrong as I'm still not familiar with it). I have one tiny quibble with this approach -- eval{} lacks a degree of granularity that would be very helpful in distinguishing between errors. This method forced me to check what's in @$ and then decide what to do, rather than working from the assumption that if something died, then it was probably for a good reason. Again, this is nothing more than a matter of style, and if it's commonplace for objects to die() and for those errors to be trapped by eval{} then I'd suggest going with the common way rather than my idiosyncratic way (I'm in a 12-step program now). An alternate system, proposed by gjb uses the Exception and Class::Exception modules. I haven't worked with these modules, but based on their names alone I'd guess that they offer a promising, more Java-like system for handling errors with the desired degree of granularity. Some Bad LessonsIt's also important to note that there are a number of bad lessons taught both by my code and by some people's usage of OO in Perl, here are a few of them:
CreditsThis tutorial now directly incorporates helpful feedback from demerphq, gjb, and Abigail-II, and implicitly incorporates feedback from a number of other people as well. ChangelogDec 10, 2002 -- changed method calls from getPhrase to get_phrase to hide my Java background and address demerphq's observation that this style is often confusing to non-English speakers. Also added section on OO style and tried to address issue of how objects die(). Dec 11, 2002 -- incorporated feedback from Abigail-II regarding use strict and the better way to call private methods. That's it for now... I'm sure that I will be updating this again with more feedback from those with greater experience/knowledge/expertise in fairly short order.
Back to
Meditations
|
|