bobf has asked for the wisdom of the Perl Monks concerning the following question:

I saw this in a piece of code that I was reviewing today:

# first arguement is a -f; collect it and ignore it for now my $optionalArgument = shift; # get filename of the config file my $configFile = shift;
In reply, I added this comment:
Eventually this should use one of the Getopt modules (like Getopt::Long - parsing the cmd line manually can lead to bugs, especially as code is refactored.
The response that I got back was essentially, "we use this method in a lot of our code - how can it lead to bugs?"

I rattled off a few of the advantages that sprang to mind:

After looking back at the list, I realized that I was arguing for using named parameters and Getopt::Long - not explaining why parsing the command line manually is a Bad Idea.

I admit that my initial comment was more of a conditioned knee-jerk response than a well thought out reply. To be honest, I haven't really thought about it. I always use Getopt::Long when I need to parse the command line, but apparently I'm using it to gain several advantages, not to avoid disadvantages. Was I incorrect when I said "parsing the cmd line manually can lead to bugs"?

How do you parse the command line, and why? What are the risks for not using a module?

Thanks in advance for the input. pun intended ;-)

Replies are listed 'Best First'.
Re: Parsing the command line: manual or module?
by Velaki (Chaplain) on Aug 17, 2006 at 17:48 UTC

    In a word: Consistency.

    By adhering to a standard method of parsing the command line, you ensure that future programs will conform to accepted standards, that best practices are followed, and that existing code is maintainable by any and all resources.

    Additionally, using a module such as Getopt::Long is advantageous in that it enforces the behavior of various command line options, e.g.  -v -f filename, which is notoriously time-consuming to code well by hand. Also, it keeps the user from having to use -h for a program, when --history could be understood more easily.

    In all fairness, TMTOWTDI, but why not use a well-tested, code-proven module, like Getopt::Long? I see only advantages; no disadvantages with it -- other than maybe a small learning curve.

    Pax vobiscum,
    -v.

    "Perl. There is no substitute."
Re: Parsing the command line: manual or module?
by andyford (Curate) on Aug 17, 2006 at 17:52 UTC
    Not at all trying to be disingenuous, but if you rewrite your arguments, you are expressing it as "the risks for not using a module". At first I was kidding, then when I got done, I realized I was serious. Here's a quick inversion of some of your points, just to give some flavor:
    • positional params are inflexible and therefore hard to maintain
    • positional params are error-prone because they are order-dependent
    • positional params make it difficult from the caller's perspective because they have to remember the exact order of the arguments
    • arbitrary and/or short param names inhibit memory and clarity
    • positional params require manual validation (required/optional/string/integer/etc)
    • hand-rolled command lines options make using flags and named params simultaneously difficult

    andyford
    or non-Perl: Andy Ford

      You're absolutely right. You've pointed out more clearly, however, something that I'd already noticed when reading the root node: that the response did not, indeed, actually indicate any possible introduction of bugs. These are problems, but not opportunities for more bugs to be introduced. Rather, they come across as characteristics of a bug. That's not to say that hand-rolled option parsing isn't prone to introducing bugs: it is, as indicated by Fletch. This just means that no indications that it is prone to bugs were clarified in the OP's reply.

      In other words, hand-rolled option parsing such as is described here is itself a bug. It introduces its own problems at runtime. It should be fixed as a bug. Luckily, it's a bug with a known, relatively easy solution.

      On the other hand, I think that the OP's "positive" approach was more diplomatic than the "negative" approach taken in your rephrasing, andyford. What you posted is excellent for illustration purposes in answering the original question, but is not how I'd address the questions of a coworker (if I was thinking properly at the time) because it might be perceived as accusatory.

      print substr("Just another Perl hacker", 0, -2);
      - apotheon
      CopyWrite Chad Perrin

Re: Parsing the command line: manual or module?
by Fletch (Bishop) on Aug 17, 2006 at 17:48 UTC

    The risks are the same as any other time you avoid code reuse. Just off the top of the head:

    • You're unnecessarily rewriting code which already exists
    • The rewritten code will be duplicated in multiple places
    • Different copies of the code may drift (scripts A and B get tweaked to add some behavior, but C and D languish with a bug that was fixed in F, while E, Q, and U get a completely different implementation of the same functionality)
      I would add:
      • Getopt::Long is the standard in the perl community, and when you hire new perl programmers they'll expect to see something like it in use (if they're any good).

      Actually, this isn't a bad interview question to ask a prospective employer: if they're writing command-line perl scripts and not using a Getopt::* module, they probably have other problems as well.

      Your third point is sort of an outgrowth of your second (here's hoping you don't edit that unordered list in such a way that this sentence becomes meaningless). It's a darned good point: drifting code is one of the biggest disadvantages of DRY principle violation. This is also the major opportunity for actual bugs to be introduced due to the by-hand specification of command line option parsing in every single program individually.

      Plus, y'know, a good programmer should be lazy enough about stuff like this to want to do it "right" in the first place, since it's less work to use someone else's command line option parsing module than to write your own code every time.

      Of course, Getopt::Std is simpler to use, so that might end up being the really lazy answer.

      print substr("Just another Perl hacker", 0, -2);
      - apotheon
      CopyWrite Chad Perrin

Re: Parsing the command line: manual or module?
by davido (Cardinal) on Aug 17, 2006 at 17:56 UTC

    I'll add another reason to advocate code reuse: collaboration.

    Thousands of developers have used Getopt::Long. It has been proven, tested debuged, refined, pondered, enhanced, and applied through countless test and real-world use cases. There is no way that a home made solution will have evolved through as rigorous a refining forge as a core module. Code reuse, and in particular the use of widely popular modules, is inherently and consistantly safer than inventing your own solution to a problem that was solved ages ago.

    If you've got a new problem, not addressed by a trusted and proven module, you earn the fun of inventing your own solution. But command line parameter parsing has been done before, the right way, a lot. Unless you've got a unique need not solved by existing solutions there's no need to risk making a mistake building your own approach to a problem that's already been solved.


    Dave

Re: Parsing the command line: manual or module?
by talexb (Chancellor) on Aug 17, 2006 at 18:38 UTC

    Before I wrote my code in Perl, I was a C programmer. One of the first C applications that I wrote started off as a prototype, and as often happens, the prototype became the working piece of Production code.

    One of the routines started out with a few parameters, then grew, finally needing eight or ten parameters. Every time I'd add another, I'd think, "Gee, this is getting really unwieldy".

    Well, that's hindsight.

    In C, you have no choice but to pass in a long list of parameters .. but in Perl, there's no need to cripple your code with that kind of limitation. As soon as a function requires more than two or three paramters, make them an arg hash. If it's the comman line arguments, use Getopt::Long.

    The risks of not using a module are that you'll reinvent your own wheel. This is not a bad thing, except that your solution isn't going to have the attention paid to it that the equivalent module did, and it'll take longer.

    That could pay off in the long run, but it might lead to difficult discussions with your manager about why the three week development schedule has expanded out to eight months. Part of being a Senior Developer is knowing when to write it yourself, and when to use something that someone else has written.

    Let CPAN make you look good today.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      I want to say there's a refactoring (from the book of the same name) that is recommended for when a routine starts getting too many parameters. I want to say something along the lines of encapsulating some (or all) of the arguments into an object, and/or possibly moving the behavior onto that object so the current implementer becomes a client using an instance of the new class.

      (Unfortunately I don't have my copy at hand, but someone may chime in that actually remembers it or does have a copy nearby . . .)

      Update: Found it: Introduce Parameter Object, p295. The (simple) example given is a series of calls which all take a start and end Date; the refactoring is to create a DateRange class which encapsulates both.

      You can pass mutiple arguments via a struct in C in just the same way you use an arg hash in Perl. You can use getopt(3C) in C in just the same way you use Getopt::Long in Perl. Similar techniques exist for other languages.

        Yeah, I know that -- but every time you want to add a parameter you have to modify the struct definition and do a make, so it's not a simple thing to do. I didn't want to confuse the post with that situation.

        In Perl, if you want to add something to an arg hash, the caller and the callee need to be modified; no one else cares, and that's the way it should be.

        I deleted a paragraph from my original post that talked about how I wrote a device independent video graphics module for two graphics cards (heh), CGA (if you can call 640x400 useful) and Hercules (720x348 or something like that -- ok, wikipedia says it was 720x350, close enough). To use these two cards, I would call a subsystem with function pointers for pointers to each of a dozen function pointers functions, when obviously a pointer to a structure containing function pointers would have been way more efficient way to implement that.

        Like I said, it was my first big project. My coding standards have improved immensely since then -- hey, I discovered make back then and thought it was a pretty advanced tool. It was only later that I discovered it had been ported from Unix.

        Alex / talexb / Toronto

        "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Updated Sunday August 20, 2006 at 1222 After re-reading, realized that my purple prose needed a little clarification. Old is struck through, new is in italics.

Re: Parsing the command line: manual or module?
by GrandFather (Saint) on Aug 17, 2006 at 19:36 UTC

    In a way it is like parsing CSV or HTML/XML with regexen - it's easy for the easy stuff, but you will get bitten by the edge cases. With a good module someone has already thought about the edge cases and provided ways of managing them.

    The down side with the few command line parsing modules I've glanced at is that they all focus on *nix style command line conventions. They simply don't handle DOS/Windows conventions (that I've noticed). Because of that I tend to use a command line parsing "template" chunk of code that gets pasted (along with a help/error exit routine) into whatever new script I'm writing that needs command line processing. I should at least generate a module from it, but it hasn't happened yet.

    The fish hooks in command line processing come from duplicate flag processing, quoted parameters and intersperced flags and parameters. Handling defaults, required parameters and help processing and error handling tend to be related issues. By the time you've handled all that lot there is a fair chunk of code involved. Add in the test suite and you really have something worth a decent sized module. At that point letting someone else do the work starts to seem worth while!


    DWIM is Perl's answer to Gödel
Re: Parsing the command line: manual or module?
by CountZero (Bishop) on Aug 18, 2006 at 06:11 UTC
    By its very own words and form it shows that this small piece of script is already buggy!
    # first argument is a -f; collect it and ignore it for now my $optionalArgument = shift;
    1. If one knows that the first argument is always -f why collect and save it? It is always there, so deal with it and don't force your users to type unnecessary arguments!
    2. If this argument is always there, why is it then called $optionalArgument?
    3. If the argument is really optional, shifting the non existing optional argument will really screw up your code!
    So the question is not "How could this code be buggy?" but "How could this code be not buggy?"

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Parsing the command line: manual or module?
by tilly (Archbishop) on Aug 18, 2006 at 02:14 UTC
    Whenever caller and callee have to exactly synchronize, with no error checking, you will tend to get bugs because people will sometimes make mistakes.

    The hand-rolled parsing in this case requires exactly this kind of synchronization, and lacks error checks.

    Using Getopt::Long allows for more flexibility and better error checking. On the one hand you'll get fewer errors because, for instance, getting arguments in the wrong order won't be a problem. On the other hand when there are errors, you are more likely to be told about it so they won't survive. Both are Good Things.

    Less code, easier maintainance, etc are just icing on the cake.

Re: Parsing the command line: manual or module?
by graff (Chancellor) on Aug 18, 2006 at 00:41 UTC
    # first arguement is a -f; collect it and ignore it for now my $optionalArgument = shift; # get filename of the config file my $configFile = shift;
    If that's all there was in terms of handling command-line args (if there really was no checking, and no reporting about expected and invalid usage), then the script is nearly unusable. What if the first arg isn't "-f"? What if the next one isn't the name of a config file?

    And as already pointed out, if there's a chance the script will need to be adapted someday to handle additional variations on its behavior, it will be more unusable (and unmaintainable as well) until some sort of Getopt treatment is brought into play.

    I use either Getopt::Std or Getopt::Long in many of the command-line scripts I write, and even though I haven't taken the time to memorize all the techniques, even though I have to refer to a previous script or to the module's perldoc output just about every time I use them, it still saves me time, and makes it easier to add new options to my scripts when I need to.

    (Having said that, I'll confess that there are also a few occasions when I somehow conclude that I can handle what's needed myself, without Getopt -- but even then, I at least allow for flexibility in the ordering of args, verify that args are as expected, and die with an appropriate error and usage summary when they aren't.)

Re: Parsing the command line: manual or module?
by eyepopslikeamosquito (Archbishop) on Aug 18, 2006 at 13:01 UTC

    You might also consider TheDamian's Getopt::Euclid:

    Getopt::Euclid uses your program's own documentation to create a command-line argument parser. This ensures that your program's documented interface and its actual interface always agree.
    Both Getopt::Euclid and Getopt::Clade are discussed in Perl Best Practices.

Re: Parsing the command line: manual or module?
by odha57 (Monk) on Aug 18, 2006 at 13:02 UTC
    For all of you, thanks for a great thread! I am fairly new to this site, but have been working with Perl for about 10 years to do various things in telecom labs. So I am pretty much self taught, bumping into new things as the need arises. I had never run across Getopt::Long. I just read the documentation and see that I can do things a better (and easier) way. Thanks!