McMahon has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks...

I recently sold an article to Better Software magazine. This is the first technical article I've ever written, and I'm asking for the monks' help reviewing my code.

There are two scripts: one is a trivial implementation of File::Find to make an inventory report; the other uses List::Compare to create a difference report on inventories created by the first script. The point of doing this is to identify files missing from one inventory to the next inventory; and to identify new files in the second inventory.

The audience for the article is manual software testers aka QA people thinking about starting to learn a scripting language. It's an audience comfortable and familiar with computers, but completely new to (and possibly uncomfortable about) programming. The point is to show that writing simple, powerful scripts is Not That Hard, so I have lots of comments and have kept the syntax as clean as possible.

Do I have enough comments? Is the syntax reasonably clear? Am I giving the reader enough information to figure out what's going on? Have I made any mistakes?

Your comments are welcome-- particularly newbies and those who've come to Perl from other languages.
################################################# #!/usr/bin/perl #InventoryReport.pl ###Good Perl practice: use warnings; use strict; ###CPAN module finds files recursively use File::Find; ###Report file: open (OUT, ">InventoryFile.txt"); ###Top-level directory to report on: my $directory = '//InstallBox/C'; ###invoking File::Find, printing path, then filename: find (\&wanted, $directory); sub wanted { print OUT "$File::Find::dir/"; print OUT "$_\n"; } __END__ ################################################# ################################################# #!/usr/bin/perl #CompareReport.pl ###good Perl practice: use warnings; use strict; ###invoke the CPAN module: use List::Compare; ###open some files: open (OUT, ">CDiffs.txt"); open (FILE1, "OldInventory.txt"); open (FILE2, "NewInventory.txt"); ###idiomatic: lowercase each pathname string after loading the arrays— ###(this is a Windows system, might not want to “lc” for Unix): my @file1 = <FILE1>; $_ = lc for @file1; my @file2 = <FILE2>; $_ = lc for @file2; ###load the files for List::Compare: my $lc = List::Compare->new(\@file1, \@file2); ###List::Compare features to identify unique elements in each file: my @file1only = $lc->get_Lonly; my @file2only = $lc->get_Ronly; ###talk to the user: print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n"; print OUT "IGNORING FILES IN THE RECYCLE BIN.\n"; print OUT "IGNORING ALL HISTORY FILES.\n"; print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; ###selection criteria is a set of easy Regular Expressions: foreach my $file1(@file1only) { unless (($file1 =~ "tmp") or ($file1=~"temp") or ($file1 =~ "recyc +ler") or ($file1 =~ "history")) { print OUT $file1; } } ###separator is a couple of newlines: print OUT "\n\n"; ###talk to the user about the second file: print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n"; print OUT "IGNORING FILES IN THE RECYCLE BIN.\n"; print OUT "IGNORING ALL HISTORY FILES.\n"; print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; print "\n\n"; ###selection for the second inventory file: foreach my $file2(@file2only) { unless (($file2 =~ "tmp") or ($file2=~"temp") or ($file2 =~ "recyc +ler") or ($file2 =~ "history")) { print OUT $file2; } } __END__ ##############################################

Replies are listed 'Best First'.
Re: Code review for magazine article?
by tilly (Archbishop) on Apr 08, 2004 at 22:07 UTC
    Every open should have an error check as described in perlstyle for the reasons given there.

    Never use unless with logical operators. Rewrite that using if and and and not. It always looks reasonable as you're writing it, but afterwards it is unnecessarily hard to read, and when you actually are trying to debug it, you will quickly get confused in trying to work out the conditions. No, I didn't believe this advice when I was handed it. After a couple of nasty debugging sessions I recognized the limitations of my brain and took it to heart.

    Other than that, most of the style things that I would fix are justified by your proposed audience.

Re: Code review for magazine article?
by runrig (Abbot) on Apr 08, 2004 at 22:12 UTC
    This:
    sub wanted { print OUT "$File::Find::dir/"; print OUT "$_\n"; }
    is equivalent to this:
    sub wanted { print OUT "$File::Find::name\n"; }
    ...and the second is technically more correct unless you're sure your code is running on an O/S where the path separator is a '/'. Though if it's just a demo you might want to show more of what's available:
    sub wanted { print OUT "Directory: $File::Find::dir\n"; print OUT "Basename: $_\n"; print OUT "Full Path: $File::Find::name\n"; }
Re: Code review for magazine article?
by graff (Chancellor) on Apr 08, 2004 at 23:38 UTC
    I should point out that if your script really starts with a line like this:
    ##################################
    followed by the "shebang" line, then it will not run as an executable in any sort of unix(-like) shell. The shebang line must be the very first line.

    Apart from that, my initial snap reaction is that it would be better (more appealing to the novice, without being more cryptic) if there were less repetition: the two rather ugly sets of print statements that "talk to the user", each one followed by very similar-looking foreach loops, are a sure sign that you'd be better off with a subroutine that you call twice with different parameters each time. When newbies catch on to subroutines -- which is not that hard to do -- it strikes them as the coolest thing and they love it immediately. You shouldn't deny them that pleasure.

Re: Code review for magazine article?
by QM (Parson) on Apr 08, 2004 at 22:49 UTC
    sub wanted { print OUT "$File::Find::dir/"; print OUT "$_\n"; }
    Why are you adding a slash after the slash? According to the documentation:
    $File::Find::dir = /some/path/ $_ = foo.ext $File::Find::name = /some/path/foo.ext

    Other than for expository reasons, I would expect to see:

    print OUT "$File::Find::name";
    instead of mucking with leaning toothpicks.

    ###selection criteria is a set of easy Regular Expressions: foreach my $file1(@file1only) { unless (($file1 =~ "tmp") or ($file1=~"temp") or ($file1 =~ "recyc +ler") or ($file1 =~ "history")) { print OUT $file1; } }
    I would reformat this to see the parallels:
    ###selection criteria is a set of easy Regular Expressions: foreach my $file1(@file1only) { unless (($file1 =~ "tmp") or ($file1 =~ "temp") or ($file1 =~ "recycler") or ($file1 =~ "history")) { print OUT $file1; } }
    And I would probably lose the lc in favor of m//i. Which, by the way, you can't say "temp"i, you have to use m"temp"i.

    I'm pretty sure no one uses m// anymore unless they're avoiding toothpicks (m{/lib/dir}), or obfu-ing (m atempa).

    Now, are you sure you want "temp"? Because that will match "lastemployee"also. Perhaps you need "^temp$" or "\btemp\b"?

    print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; print "\n\n";
    Do you need the OUT filehandle on that last print?

    print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n";
    I would put something around the file names, to set them off. Perhaps 'Tmp' and 'Temp' instead?

    And are you comparing with the directory path, or the file path? Maybe the regex should be

    $file1 =~ /\btemp\W\w)/; # temp is not the filename
    Or maybe you want to rethink the output of InventoryReport.pl so that the filename is easy to split from the dir name, and then dotted together for the final output?

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Code review for magazine article?
by Grygonos (Chaplain) on Apr 08, 2004 at 20:31 UTC
    here's what I would do... basically.. just be super explicit... avoid using functions that defaulty take $_ as a parameter... for example your original use of lc. You may even (probably bad practice for a magazine article) include an explicit version (haha) and a condensed version...to show how powerful (and/or confusing) perl can be when written in a golfing fashion. Code looks alright.. but that's with my limited experience and no use of File::Find ever ... and also no List::Compare use.

    Grygonos
Re: Code review for magazine article?
by borisz (Canon) on Apr 08, 2004 at 23:11 UTC
    I whould like to see reformated lines like this:
    foreach my $file2 (@file2only) { unless ( ( $file2 =~ "tmp" ) or ( $file2 =~ "temp" ) or ( $file2 =~ +"recycler" ) or ( $file2 =~ "history" ) ) { print OUT $file2; } }
    to this:
    foreach my $file2 (@file2only) { unless ( ( $file2 =~ "tmp" ) or ( $file2 =~ "temp" ) or ( $file2 =~ "recycler" ) or ( $file2 =~ "history" ) ) { print OUT $file2; } }
    Im not sure if this is what you mean at all, if file2 = 'historyfile' it is matched by $file2 =~ /history/. I think you mean
    foreach my $file2 (@file2only) { unless ( ( $file2 eq "tmp" ) or ( $file2 eq "temp" ) or ( $file2 eq "recycler" ) or ( $file2 eq "history" ) ) { print OUT $file2; } }
    or
    foreach my $file2 (@file2only) { unless ( $file2 =~ /^tmp|temp|recycler|history$/ ) { print OUT $file2; } }
    Boris
Re: Code review for magazine article?
by Anonymous Monk on Apr 08, 2004 at 22:20 UTC

    Using comments in code meant to be published seems really ugly to me (could just be me). Myself, I much prefer seeing the whole body of code (with each line preceeded with its line number), followed by a detailed line-by-line description of what is going on. It is so much easier to follow this way. So you might do something like this (I doubt my explanations are satisfactory, I just threw this together in a matter of 15 minutes or so):