throop has asked for the wisdom of the Perl Monks concerning the following question:
Are there any good tips on breaking up really long if / elsif / else blocks?
Case in point. My code performs natural language processing. There's a routine driveBase. It takes a token (essentially, an English word or some other delimited string.) Given a token it doesn't recognize, it recursively snips off prefixes and suffixes, looking for a recognized token. Once it's found one, it deduces the part-of-speech of the original token. It records this – plus some other bookkeeping taken from the base token.
Thing is, English is real irregular and there is a long list of special cases. So I've got the following code, with 29 clauses in the if / elsif / else.
use strict; # driveBase drives single tokens to baseform. Its returned value is # true for success. It will always inscribe $wordCache{tok}{group} if # it isn't already done. sub driveBase{ my($tok, $Args) =@_; return $wordCache{$tok} if $wordCache{$tok}{inscript}; my $checkPrefix # Check the prefix unless explicitly told not to. = defined($Args->{checkPrefix}) ? $Args->{checkPrefix} : 1; my $skipPrefix = ! $checkPrefix; my($base,$atail,$ntail) = &split_token($tok); my($sing, $unprefBase, $savgrp, $savbase); if($atail and $ntail ne ''){ # If there's both an atail and +an ntail. &driveBase("$base,$atail", $Args); # The base,tail can be a +different group than the base. &equivWord($tok, "$base,$atail", {ntail => $ntail})} # Treat GYROSCOPE,CONTRO +L MOMENT,3 same as GYROSCOPE,CONTROL MOMENT # Assume that if $wordCache{$base,,$ntail}{group}, it's the same a +s $wordCache{$base}{group} # E.g. if SOFTGOODS LAB is a type of LAB, then it has the same gro +up as LAB, unless we're told otherwise. elsif(!$atail and $ntail ne ''){ # Has a numeric tail but + no alpha tail. &driveBase($base, $Args); &equivWord($tok, $base, {ntail => $ntail})} elsif($wordCache{$base} and $wordCache{$base}{group} and $wordCache{$base}{group} ne 'UNKNOWN' and $base ne $tok){ unless(member($wordCache{$base}{group}, qw(TIMEPERIOD PARTNO R +ANGE EXTENSION)) or $Args -> {nocomplain}){ warn_once('driveBase', "WordCache should have been filled +in when we asserted '$atail' isa type of '$base' ")}; &equivWord($tok, $base, {atail => $atail}); 1} elsif(not ($atail eq '' and $ntail eq '')){ # There's either an a +tail or ntail but the form hasn't been seen before. &driveBase($base, $Args); &equivWord($tok, $base, {atail =>$atail, ntail=>$ntail})} elsif($checkPrefix and # This is the main check for prefi +xed forms &is_prefixed_partspeechAnnotated($base)){ # wordCache{base} + gets filled in. unless($tok eq $base){ #$prefixLessFm Mostly, base WILL e +q tok, but catch the odd case. &equivWord($tok, $base, {atail => $atail})}; 1} # If $checkPrefix is false, then we've already stripped a prefix, # so it can't be an abbreviation, a number or an uncannical form. elsif($checkPrefix and $canon{$tok} and $canon{$tok} ne $tok){ &driveBase($canon{$tok}, {%$Args, checkPrefix=>0}); &equivWord($tok, $canon{$tok}, {})} elsif(($sing) = $tok =~ /(.+)S$/ and $canon{$sing} and $canon{$sin +g} ne $tok){ # So REQTS => REQT => REQUIREMENT &equivWord($tok, $canon{$sing}, {plural=>1})} elsif($checkPrefix and $abbrev{$tok}){ # Not normally invoked, +as abbrevs have already been expanded usually. &equivWord($tok, $abbrev{$tok}, {abbreviation =>1})} elsif($checkPrefix and (&part_number_p($base) or $atail eq 'PARTNO')){ # P/N and PN: <partno> get transformed in immediate_transforms +; see the improves file &baseInscript($tok, {group => 'PARTNO', root => $tok})} elsif($checkPrefix and &power_loc_p($tok)){ &baseInscript($tok, {group => 'POWERLOC', root => $tok})} elsif($checkPrefix and &is_loc_code($tok)){ &baseInscript($tok, {group => 'LOC_CODE', root => $tok})} elsif($checkPrefix and $savbase = &is_measure_physics_frac($tok)){ $num_item{$tok} = 1 unless defined($num_item{$tok}); &baseInscript($tok, {group => $savbase, num_item =>1, root => $tok})} elsif($savbase = &is_gerund($base, 1)){ warn_once('GroupOf2', "Gerund $tok with base $base and tails ' +$atail', '$ntail'") if $atail or $ntail; &equivWord($tok, $savbase, {group=> 'GERUND', atail => $atail})} elsif($savbase = &is_pastverb($base, 1)){ &equivWord($tok, $savbase, {group=> 'VERB', # Even if the bas +eform is NVRB. atail => $atail, verbtense => 'PAST'})} elsif(&is_prefixed_verb($base, 1)){ warn_once('GroupOf2.b', "Why wasn't $base of $tok found earlie +r?")} elsif($savbase = &is_ize_verb($base, 1)){ &driveBase($savbase, {checkPrefix => 0}); # usually a no-op +, but every once in a while... &equivWord($tok, $savbase, {group=> 'VERB', # Even if the bas +eform is NVRB. atail => $atail})} elsif($sing = &is_3rdpersonverb($base, 1)){ # BOXES Could be e +ither the plural or the 3rd person sing. &equivWord($tok, $sing, {s_ending => 1, atail => $atail})} elsif($savbase = &is_derived_noun($base, 1)){ # '-TION' words etc +., REFUSAL, REBUTTAL, GOODNESS, BUSINESS &equivWord($tok, $savbase, {group => 'NOUN', # A few NVRBS li +ke POSITION and CONDITION will be called out explicitly. atail => $atail}, {no_use =>[qw(actorform)]})} elsif($savbase = &is_derived_adj($base, 1)){ &equivWord($tok, $savbase, {group => 'ADJECTIVE', atail => $atail}, {no_use =>[qw(actorform)]})} elsif($savbase = &is_adverb($base, 1)){ &equivWord($tok, $savbase, {group => 'ADVB', atail => $atail})} elsif($types{$base}){ # presumably this token will +get gobbled warn_once('GroupOf2.c', # by one of the neighboring to +kens. "Why does modifier $base have a tail $atail") if $at +ail; &baseInscript($tok, {group => 'MODIFIER', root => $tok})} + elsif($sing = &singularOfNoun($tok, 1)){ &equivWord($tok, $sing, {plural=>1})} elsif($base =~ /^[a-z,A-Z]$/){ &baseInscript($tok, {group => 'LETTER', root => $tok})} # if $checkPrefix is false, then we've already stripped a prefix o +ff the tok, so it can't be a roman numeral elsif($checkPrefix and (&number_p($tok) or &roman_number_p($tok))) +{ &baseInscript($tok, {group => 'NUMBER', root => $tok})} elsif(is_alphnum($base)){ &baseInscript($tok, {group => 'ALPHNUM', root => $tok})} elsif($base =~ /^$punctuation_tokens$/){ &baseInscript($tok, {group => 'PUNCTUATION', root => $tok})} # So COUNTERDOFFABLE gets tested as DOFFABLE and returns ADJECTIVE elsif($checkPrefix and $unprefBase and # Set in an earlier test, above $savgrp = &group_of($unprefBase, 1) and # This time, try to + derive a group for the base. &member($metagroup{$savgrp}, 'NVRB', 'NOUN','VERB','ADJECTIV +E','ADVB', 'GERUND')){} else{ &baseInscript($tok, {group => 'UNKNOWN', root => $tok})}; $wordCache{$tok}}
The tests are all different. The order in which the tests are done is (sometimes) important. Some of the intermediate results from one test are re-used in a later test. The code runs great.
But the function runs 114 lines, over 7000 characters. I'd like to break it up into smaller functions (or make it smaller in some other way.) I don't expect anybody to refactor my code for me. But can anybody suggest a coding approach that is more maintainable, more compact, and still readable?
throop
==============================
Update: I considered using dispatch tables, but they didn't seem to me to fit this problem, becauseOr am I missing something?
- I wanted tight control over the order of tests
- Many of my tests involve function calls, not just eq tests
- Intermediate results from earlier tests are saved in 'my' variables and used by later tests.
throop
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Really Long if/elsif/else blocks
by MidLifeXis (Monsignor) on May 30, 2008 at 18:38 UTC | |
by throop (Chaplain) on May 30, 2008 at 19:32 UTC | |
by MidLifeXis (Monsignor) on May 30, 2008 at 20:19 UTC | |
by perreal (Monk) on May 30, 2008 at 22:10 UTC | |
|
Re: Really Long if/elsif/else blocks
by zentara (Cardinal) on May 30, 2008 at 18:10 UTC | |
by Fletch (Bishop) on May 30, 2008 at 18:32 UTC | |
|
Re: Really Long if/elsif/else blocks
by apl (Monsignor) on May 30, 2008 at 19:25 UTC | |
by throop (Chaplain) on May 30, 2008 at 20:12 UTC | |
by graff (Chancellor) on May 31, 2008 at 04:56 UTC | |
by apl (Monsignor) on May 31, 2008 at 10:54 UTC | |
|
Re: Really Long if/elsif/else blocks
by educated_foo (Vicar) on May 31, 2008 at 03:04 UTC | |
by mandarin (Hermit) on May 31, 2008 at 20:29 UTC | |
by MidLifeXis (Monsignor) on May 31, 2008 at 22:47 UTC | |
by educated_foo (Vicar) on Jun 01, 2008 at 03:31 UTC | |
by Anonymous Monk on Jun 01, 2008 at 12:05 UTC | |
by parv (Parson) on Jun 01, 2008 at 08:05 UTC | |
|
Re: Really Long if/elsif/else blocks
by pc88mxer (Vicar) on May 30, 2008 at 22:45 UTC | |
|
Re: Really Long if/elsif/else blocks
by Zen (Deacon) on May 30, 2008 at 21:46 UTC | |
|
Re: Really Long if/elsif/else blocks
by doom (Deacon) on May 31, 2008 at 00:04 UTC | |
|
Re: Really Long if/elsif/else blocks
by John M. Dlugosz (Monsignor) on May 31, 2008 at 01:40 UTC | |
|
Re: Really Long if/elsif/else blocks
by Starky (Chaplain) on Jun 05, 2008 at 06:06 UTC |