Are there any good tips on breaking up really long if / elsif / else blocks?
Case in point. My code performs natural language processing. There's a routine driveBase. It takes a token (essentially, an English word or some other delimited string.) Given a token it doesn't recognize, it recursively snips off prefixes and suffixes, looking for a recognized token. Once it's found one, it deduces the part-of-speech of the original token. It records this – plus some other bookkeeping taken from the base token.
Thing is, English is real irregular and there is a long list of special cases. So I've got the following code, with 29 clauses in the if / elsif / else.
use strict; # driveBase drives single tokens to baseform. Its returned value is # true for success. It will always inscribe $wordCache{tok}{group} if # it isn't already done. sub driveBase{ my($tok, $Args) =@_; return $wordCache{$tok} if $wordCache{$tok}{inscript}; my $checkPrefix # Check the prefix unless explicitly told not to. = defined($Args->{checkPrefix}) ? $Args->{checkPrefix} : 1; my $skipPrefix = ! $checkPrefix; my($base,$atail,$ntail) = &split_token($tok); my($sing, $unprefBase, $savgrp, $savbase); if($atail and $ntail ne ''){ # If there's both an atail and +an ntail. &driveBase("$base,$atail", $Args); # The base,tail can be a +different group than the base. &equivWord($tok, "$base,$atail", {ntail => $ntail})} # Treat GYROSCOPE,CONTRO +L MOMENT,3 same as GYROSCOPE,CONTROL MOMENT # Assume that if $wordCache{$base,,$ntail}{group}, it's the same a +s $wordCache{$base}{group} # E.g. if SOFTGOODS LAB is a type of LAB, then it has the same gro +up as LAB, unless we're told otherwise. elsif(!$atail and $ntail ne ''){ # Has a numeric tail but + no alpha tail. &driveBase($base, $Args); &equivWord($tok, $base, {ntail => $ntail})} elsif($wordCache{$base} and $wordCache{$base}{group} and $wordCache{$base}{group} ne 'UNKNOWN' and $base ne $tok){ unless(member($wordCache{$base}{group}, qw(TIMEPERIOD PARTNO R +ANGE EXTENSION)) or $Args -> {nocomplain}){ warn_once('driveBase', "WordCache should have been filled +in when we asserted '$atail' isa type of '$base' ")}; &equivWord($tok, $base, {atail => $atail}); 1} elsif(not ($atail eq '' and $ntail eq '')){ # There's either an a +tail or ntail but the form hasn't been seen before. &driveBase($base, $Args); &equivWord($tok, $base, {atail =>$atail, ntail=>$ntail})} elsif($checkPrefix and # This is the main check for prefi +xed forms &is_prefixed_partspeechAnnotated($base)){ # wordCache{base} + gets filled in. unless($tok eq $base){ #$prefixLessFm Mostly, base WILL e +q tok, but catch the odd case. &equivWord($tok, $base, {atail => $atail})}; 1} # If $checkPrefix is false, then we've already stripped a prefix, # so it can't be an abbreviation, a number or an uncannical form. elsif($checkPrefix and $canon{$tok} and $canon{$tok} ne $tok){ &driveBase($canon{$tok}, {%$Args, checkPrefix=>0}); &equivWord($tok, $canon{$tok}, {})} elsif(($sing) = $tok =~ /(.+)S$/ and $canon{$sing} and $canon{$sin +g} ne $tok){ # So REQTS => REQT => REQUIREMENT &equivWord($tok, $canon{$sing}, {plural=>1})} elsif($checkPrefix and $abbrev{$tok}){ # Not normally invoked, +as abbrevs have already been expanded usually. &equivWord($tok, $abbrev{$tok}, {abbreviation =>1})} elsif($checkPrefix and (&part_number_p($base) or $atail eq 'PARTNO')){ # P/N and PN: <partno> get transformed in immediate_transforms +; see the improves file &baseInscript($tok, {group => 'PARTNO', root => $tok})} elsif($checkPrefix and &power_loc_p($tok)){ &baseInscript($tok, {group => 'POWERLOC', root => $tok})} elsif($checkPrefix and &is_loc_code($tok)){ &baseInscript($tok, {group => 'LOC_CODE', root => $tok})} elsif($checkPrefix and $savbase = &is_measure_physics_frac($tok)){ $num_item{$tok} = 1 unless defined($num_item{$tok}); &baseInscript($tok, {group => $savbase, num_item =>1, root => $tok})} elsif($savbase = &is_gerund($base, 1)){ warn_once('GroupOf2', "Gerund $tok with base $base and tails ' +$atail', '$ntail'") if $atail or $ntail; &equivWord($tok, $savbase, {group=> 'GERUND', atail => $atail})} elsif($savbase = &is_pastverb($base, 1)){ &equivWord($tok, $savbase, {group=> 'VERB', # Even if the bas +eform is NVRB. atail => $atail, verbtense => 'PAST'})} elsif(&is_prefixed_verb($base, 1)){ warn_once('GroupOf2.b', "Why wasn't $base of $tok found earlie +r?")} elsif($savbase = &is_ize_verb($base, 1)){ &driveBase($savbase, {checkPrefix => 0}); # usually a no-op +, but every once in a while... &equivWord($tok, $savbase, {group=> 'VERB', # Even if the bas +eform is NVRB. atail => $atail})} elsif($sing = &is_3rdpersonverb($base, 1)){ # BOXES Could be e +ither the plural or the 3rd person sing. &equivWord($tok, $sing, {s_ending => 1, atail => $atail})} elsif($savbase = &is_derived_noun($base, 1)){ # '-TION' words etc +., REFUSAL, REBUTTAL, GOODNESS, BUSINESS &equivWord($tok, $savbase, {group => 'NOUN', # A few NVRBS li +ke POSITION and CONDITION will be called out explicitly. atail => $atail}, {no_use =>[qw(actorform)]})} elsif($savbase = &is_derived_adj($base, 1)){ &equivWord($tok, $savbase, {group => 'ADJECTIVE', atail => $atail}, {no_use =>[qw(actorform)]})} elsif($savbase = &is_adverb($base, 1)){ &equivWord($tok, $savbase, {group => 'ADVB', atail => $atail})} elsif($types{$base}){ # presumably this token will +get gobbled warn_once('GroupOf2.c', # by one of the neighboring to +kens. "Why does modifier $base have a tail $atail") if $at +ail; &baseInscript($tok, {group => 'MODIFIER', root => $tok})} + elsif($sing = &singularOfNoun($tok, 1)){ &equivWord($tok, $sing, {plural=>1})} elsif($base =~ /^[a-z,A-Z]$/){ &baseInscript($tok, {group => 'LETTER', root => $tok})} # if $checkPrefix is false, then we've already stripped a prefix o +ff the tok, so it can't be a roman numeral elsif($checkPrefix and (&number_p($tok) or &roman_number_p($tok))) +{ &baseInscript($tok, {group => 'NUMBER', root => $tok})} elsif(is_alphnum($base)){ &baseInscript($tok, {group => 'ALPHNUM', root => $tok})} elsif($base =~ /^$punctuation_tokens$/){ &baseInscript($tok, {group => 'PUNCTUATION', root => $tok})} # So COUNTERDOFFABLE gets tested as DOFFABLE and returns ADJECTIVE elsif($checkPrefix and $unprefBase and # Set in an earlier test, above $savgrp = &group_of($unprefBase, 1) and # This time, try to + derive a group for the base. &member($metagroup{$savgrp}, 'NVRB', 'NOUN','VERB','ADJECTIV +E','ADVB', 'GERUND')){} else{ &baseInscript($tok, {group => 'UNKNOWN', root => $tok})}; $wordCache{$tok}}
The tests are all different. The order in which the tests are done is (sometimes) important. Some of the intermediate results from one test are re-used in a later test. The code runs great.
But the function runs 114 lines, over 7000 characters. I'd like to break it up into smaller functions (or make it smaller in some other way.) I don't expect anybody to refactor my code for me. But can anybody suggest a coding approach that is more maintainable, more compact, and still readable?
throop
==============================
Update: I considered using dispatch tables, but they didn't seem to me to fit this problem, becauseOr am I missing something?
- I wanted tight control over the order of tests
- Many of my tests involve function calls, not just eq tests
- Intermediate results from earlier tests are saved in 'my' variables and used by later tests.
throop
In reply to Really Long if/elsif/else blocks by throop
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |