As a follow up to my earlier node Is there an ideal module size or subroutine size?, I offer my thoughts on refactoring my own web application code.

Setup: My CGI::Application web app consists of three modules, Users, Projects and Files. Recently, I had to add functionality to allow for the creation of multiple users, attached to one or more projects, or not attached at all. I also had to add similar functionality when creating a project: it should be possible to add a pile of user names (we're using E-Mail addresses) as new members of this project.

Development: Realizing that there would be some duplicated work, I added the functionality to the Create Project run mode, got it working, then hacked the same code (cargo cult, I know) into Create User, and get that to work.

Refactoring: Once the code was all working, I printed out the two modules, spread them out over a large counter top, and started marking them up. I highlighted the package statements and each of the method names and drew boxes around the duplicated hunks of code. I spent a long time looking at repeated patterns in the code, separating the run modes from the utility routines.

I moved the utility routines into a seperate Util module, ending up with Users::Util and Projects::Util, and fixed the various problems -- missing modules in the two Util modules, extraneous modules in the main modules.

Having done a lot of work on the Projects and Users modules, I then printed out the Files module and found that I could do a lot of similar work there, and ended up creating a Files::Util module. I also found that there were a few routines that would be used in several of the modules, and created a top level Util module that all of the modules used.

Testing: It's hard to test your own code, but I try. I have Log::Log4perl turned to DEBUG for all of my modules, and I tail the log file in one window, and tail httpd-ssl-error in another window, then bash away using all of the features that I can think of. If I fix an error, I go back and test all of the functionality around that error.

Conclusion: You sure get some weird looks from co-workers when you stand over twenty odd pages of code, printed two-up, for several days running. But I'm now very happy with the structure of the code, and I look forward to a code review. And the size of my modules went down by a total of 388 lines, but that was made up by an increase in the Util modules with a total of 492 lines. Still, it's worth it -- the code's better organized, with run modes in the main modules and utility routines in the Util sub-modules. And my average module size wen from about 900 lines to about 800 lines -- that's OK. At least perltidy is happy with my code, and I have tests for some of the utility modules.

Oh, did I forget to mention the 1800 lines of Template::Toolkit web pages? Eh, never mind.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re: My code refactoring journey
by stark (Pilgrim) on Aug 30, 2007 at 09:56 UTC
    I just wanted to mention a tool that can be helpful for this kind of work:

    CPD: A Copy and Paste detector. It is written in Java but also works for Perl.

      This looks like an interesting tool, and I have no objections to a tool *written* in Java, except that this one also appears to be written *for* Java, and it appears one needs to create rulesets in order to use this. Very neat suggestion, though.

      Heh -- and now I have another itch to scratch -- a software tool that locates copied and pasted code. Hmm .. I do have a grid engine at my disposal ..

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Although there is no explicit ruleset for Perl available it works quite well. I tried it before a ruleset for PHP was available, maybe it works even better now...

        Just give it a try.

        And you do not need a grid engine for this - ckeching a medium sized project with the CPD works really fast on an average workstation.

      My understanding of the Git version-control system is that it detects quite well where code has been copied, renamed, moved, or transcribed.

      It's maybe not as oriented to the task as CPD, but another approach to the same problem (with further benefits).

      -David

Re: My code refactoring journey
by andreas1234567 (Vicar) on Aug 31, 2007 at 09:50 UTC
    Testing: It's hard to test your own code, but I try.
    True, but the fact that you try and even got some unit tests is a sign of an emerging quality process.

    Selenium IDE allows you to create automated tests for web applications. It an "record and playback" test framework which integrates with the Firefox web browser, all released under the Apache 2.0 Licence. Really cool.

    --
    Andreas