perlmeditation
sundialsvc4
<p>
Like everyone else, I have been reading the “10**21” threads with some academic interest – but also, I confess, some confusion when talk is made of “4 gigabyte lookup-tables.” If the poor Romans actually had to do <em>that,</em> they’d never have been able to count anything. An <em>algorithm</em> for converting Roman numerals to decimal is trivially easy: just <tt>reverse</tt> the string <i>(say ...)</i> so that you intepret it from right-to-left instead of left-to-right.
<code>
use strict;
use warnings;
my %letters = ('I' => 1, 'V' => 5, 'X' => 10, 'L' => 50,
'C' => 100, 'D' => 500, 'M' => 1000);
my $roman = "MCCMLXXIV"; #1874
my $result = 0;
my $high_score = 0;
foreach my $letter ( split(//, reverse($roman)) ) {
my $score = $letters{$letter};
if ($score >= $high_score) {
$result += $score;
}
else {
$result -= $score;
}
$high_score = $score if $score > $high_score;
}
print "$result\n";
</code>
</p><p>
When the string is viewed from left-to-right, parsing seems difficult because in a well-formed Roman string like <tt>MCCMLXXIV</tt> you don’t know what <tt>C</tt> means until you encounter <tt>M</tt>. However, that’s not the way the Romans did it. <em>They</em> probably also read the string from right-to-left, as was and is the standard for many human languages of the region. Now, the rule is one that any human could do in his head: “if the digit is as big or bigger than the biggest one you have seen so-far, reading from right to left, add it to the total; otherwise, subtract.” They did not use arbitrary combinations of letters, although the algorithm shown above would accept them. (As an example, I doubt that they actually used <tt>IM</tt> to represent <tt>999</tt>.) In fact, the rules that they <em>did</em> use make the process similar to “making change” given various denominations of currency, and this can’t be accidental.
</p><p>
So, while the talk of this being a “10**21 problem” is an interesting one, and the latest installment (#3) about whacking the TLB of the processor is even more so, it seems like an incredible amount of work to unleash on a problem that has a trivial algorithmic solution. I am certainly not “dissing” that thread for its informative discussion of how to deal with problems that <em>would</em> require massive search, but Roman Numerals does not seem to be a good example to use here, unless I am <em>entirely</em> missing the point. (Which possibility I do not deny.) The proper solution for Roman Numerals would be to “find a better algorithm,” then optimize-the-hell out of <em>that</em> algorithm. I cannot imagine that any solution would be faster, especially when the time required to populate those gigabytes of data is included, but I say this only because a nearly cost-free algorithm to solve <em>this</em> problem readily exists, as just shown. For problems that lack such alternatives, the issues discussed in that thread are of course hugely relevant, and the solutions, very informative.
</p><p>
- - - - -
</p><p>
For the “true die-hard golfers” among us, the same algorithm can be expressed, even without any conditional branches at all other than for the scan-loop, as a finite-state machine (FSM) with twenty-six-element tables containing numbers that are either positive or negative depending on the table. The machine’s state would, of course, represent the largest digit, if any, yet seen when scanning from right to left. For instance, in the table for state <tt>LARGEST_SEEN_IS_V</tt> <i>(i.e.</i> all tables except <tt>LARGEST_SEEN_IS_I</tt> and <tt>INITIAL_STATE</tt>), the entry for <tt>I</tt> would now be <tt>(-1)</tt>. A “real” Perl implementation would use a C-language subroutine. If an <tt>ERROR_STATE</tt> is provided, such an algorithm could also detect and reject “syntax error” inputs (as the algorithm shown above cannot do). Which is exactly why we build language-parsers and lexical-scanners this way . . .
</p><p>
- - - - -
</p><p>
Clearly, the intent of the main thread must be to present efficient solutions to problems that actually <em>do</em> consist of arbitrary-string inputs, for which all possible strings are equally probable, under conditions where arbitrarily-large amounts of physical RAM are presumed to be available. Problems that, unlike Roman Numerals, do not present a trivial algorithmic solution. Problems for which an efficient solution in the Perl language (or at least, the Perl environment) is required.
</p><p>
- - - - -
</p><p>
<i>(Postscript:</i> I recently helped with a geocaching-like team building challenge for highly-paid “alpha-male” executives, in which you had to use a GPS to find nearby things in an unfamiliar town and obtain clues. One of the things we took them to was a statue which had a date written in Roman Numerals. They just needed to get that date. No big deal, any third-grader could do that, right? Uh uh. I was speechless to watch several(!) of them busily searching-for and downloading a “Roman Numerals <em>app”</em> to let them decipher the clue ... which they had utterly no idea how to do on their own. They <em>could not read</em> Roman Numerals! So they sat there, locating and downloading – in a couple of cases, buying – an app for their cell-phones. <i>“Oy vey!”)</i>
</p>