perlmeditation
eyepopslikeamosquito
<P>
Competing in a modern <a href="http://codegolf.com/">codegolf</a> tournament
can be a lonely experience. You play alone. You analyze alone.
After an occasional breakthrough, you celebrate alone,
your hard-won solution a closely guarded secret.
There is no nineteenth hole, no place to share your
solutions with your buddies over a beer.
</P>
<P>
Accordingly, and given the codegolf
<a href="http://codegolf.com/roman-to-decimal">Roman-to-Decimal course</a>
has been open for more than two years now,
I've decided to share my solutions to that game here.
You'll get to look over my shoulder as I describe
the circumstances and thought processes behind these solutions.
I hope you'll find my journey through this competition both
interesting and instructive, especially so for the serious, aspiring golfer.
</P>
<readmore>
<P><B>Rules of the Game</B></P>
<P>
In this game, you must convert a Roman numeral to its integer value.
The input numeral, provided on stdin followed by a single newline, is
guaranteed to be in the range <C>I</C> to <C>MMMCMXCIX</C> (1 to 3999).
For example, given <C>IV</C> on stdin, your program should print <C>4</C> to stdout.
</P>
<P><B>Test Driven Golf</B></P>
<P>
I always like to start a golf game by writing a comprehensive test program.
That saves time in the long run, enabling you to easily verify the many and varied
solutions you must concoct to be competitive at golf.
So, just like any journeyman test-first
developer, I began this game by writing such a test program. A little searching on the CPAN
quickly uncovered the [mod://Roman] module,
which I used to write the following test program:
<CODE>
use strict;
use warnings;
use Roman;
my $testprog = shift or die "usage: $0 program-file\n";
my $datafile = 'tt.txt';
my $cmd = "perl $testprog <$datafile";
sub build_file {
my $contents = shift;
open(my $fh, '>', $datafile) or die "open '$datafile': $!";
print $fh "$contents\n";
close($fh);
}
print "Testing $testprog, size=", -s $testprog, " bytes.\n";
for my $i (1..3999) {
my $r = uc(roman($i));
print "i=$i r=$r\n";
build_file($r);
my $got = `$cmd`;
chomp($got);
$got eq $i or die "expected '$i' got '$got'\n";
}
print "successful\n";
</CODE>
</P>
<P>
Since this tests <I>all</I> valid input values
(1..3999), any solution that passes this test program should be correct.
Unfortunately, there are usually holes (no pun intended) in the test programs used by the codegolf web site
because, in their fully automated flavour of golf, the test program must complete
in less than four seconds, so it's usually not feasible to exhaustively test all possible inputs.
Their test program for this game, however, is a pretty good one because it includes
test cases for all six of the nastier edge cases (<CODE>IV, IX, XL, XC, CD, CM</CODE>),
plus eight random ones, making it improbable for an incorrect solution to be accepted.
</P>
<P><B>Getting Started</B></P>
<P>
In any golf game, I try to get a complete working solution as quickly as possible.
This method of golfing boosts morale in that you
always have a working solution -- and enjoy frequent positive feedback each
time you shorten it.
</P>
<P>
I'd just finished playing the [id://594299|Fonality Golf Challenge],
so I began this game simply by inverting Ton's winning HART
magic formula from the Fonality game:
<CODE>
s!.!y$IVCXL426(-:$XLMCDIVX$dfor$$_.=5x$&*8%29628;${"$$_
"}=-$_!egfor-4e3..0;print${<>}
</CODE>
86 strokes. Hmmm. The leading score posted at this early stage
was a remarkable 60 strokes by perl internals hacker [robin].
This hot start made it immediately clear that
Ton's astonishing and magical decimal-to-roman formula was going nowhere
in this new game, where you must convert the other way.
That was a relief to me because I didn't relish a repeat of the ill feeling surrounding
the [id://592386] thread.
</P>
<P>
Since I'd already downloaded the CPAN Roman module, I next took a look at the
algorithm it employed, namely:
<CODE>
sub arabic {
my($arg) = shift;
isroman $arg or return undef;
my($last_digit) = 1000;
my($arabic);
foreach (split(//, uc $arg)) {
my($digit) = $roman2arabic{$_};
$arabic -= 2 * $last_digit if $last_digit < $digit;
$arabic += ($last_digit = $digit);
}
$arabic;
}
</CODE>
Though obviously not golfed, I was impressed by this function, especially this line:
<CODE>
$arabic -= 2 * $last_digit if $last_digit < $digit;
</CODE>
because this clever trick enables the algorithm to make one pass
through the input string, one character at a time.
</P>
<P>
<blockquote>
<P>
<I>
Nowadays we have a principle in Perl,
and we stole the phrase Huffman coding for it,
from the bit encoding system where you have different sizes for characters.
Common characters are encoded in a fewer number of bits,
and rarer characters are encoded in more bits.
We stole that idea as a general principle for Perl,
for things that are commonly used,
or when you have to type them very often -- the
common things need to be shorter or more succinct.
</I>
</P>
<P align="right">
<small>
-- <a href="https://www.linuxvoice.com/interview-larry-wall/">Larry Wall</a>
</small>
</P>
</blockquote>
</P>
<P>
My prior golfing experience told me that any algorithm that essentially
just traverses a string one character at a time is usually very competitive.
And this very common operation of traversing a string one character at a time
should be golf-able in just about any language -- assuming the language
designers <I>Huffman-coded</I> common operations, as Larry Wall did.
</P>
<P>
The primary tactical problems to be solved with this algorithm are how
to shorten the code surrounding <CODE>$last_digit</CODE>
and how to perform the <CODE>roman2arabic</CODE> lookup.
</P>
<P>
<blockquote>
<I>
Regexes always win (Mtv's law of golf)
</I>
</blockquote>
</P>
<P>
Golfing this promising algorithm proved straightforward once I remembered
Mtv's law and used a regex with side-effects (<CODE>$'</CODE>) for the lookup table:
<CODE>
I1V5X10L50C100D500M1e3=~$_,$\-=2*$}x($}<$')-($}=$')for<>=~/./g;print
</CODE>
Notice that this solution uses the well-known golfing trick, invented by Dutch
golfing maestro Eugene van der Pijll, of using <CODE>$\</CODE> as the accumulator,
allowing you to print it with a bald
<CODE>print</CODE>, rather than <CODE>print$a</CODE> say, thus saving two strokes.
</P>
<P>
Getting to 68 with very little effort made this first approach look very promising.
Indeed, this simple algorithm formed the basis of the winning solutions
in all four languages (Perl, Python, Ruby, PHP), albeit with many
twisty tactical turns along the way.
</P>
<P><B>An Interesting Diversion</B></P>
<P>
<blockquote>
<P>
<I>
How many different algorithms did you try? Me, I tried it four or five different ways until I stumbled upon one that seemed shorter
</I>
</P>
<P align="right">
<small>
-- Expert golfer [Jasper] offers sound advice to novice golfer [petdance] in [id://593648]
</small>
</P>
</blockquote>
</P>
<P>
In golf, it's desirable to have multiple irons in the fire; that is, to
have multiple working solutions using different approaches. Why?
Because it's often possible to produce a solution shorter than all of them
by combining the best ideas from the various approaches.
A related lesson I took from this game is the value of writing solutions
in all four languages (Perl, Python, Ruby, PHP); the act of doing that often
uncovered algorithmic and tactical improvements that could be transferred
from one language to another.
</P>
<P>
So, to broaden possible solutions, I played around with
a <CODE>tr</CODE> (aka <CODE>y</CODE>) transform reminiscent of the Fonality game:
<CODE>
#!perl -lp
s#\d#!($\+=$.*$&*(2cmp$'))#eg,$..=0while y/MDCLXVI/CLXVI51/
</CODE>
70 strokes. But I had to work very hard.
I spent far too long fiddling with this approach because
I found it fascinating. My golfing instincts were telling me, however, that this approach
was far less promising than the previous simpler one, so I reluctantly gave
it up at this early stage and never came back to it.
</P>
<P>
Since symbolic references were so effective in the Fonality game,
I naturally tried a similar approach early in this game.
Using xor, a common golfing technique, the
required <C>$I, $V, $X, $L, $C, $D, $M</C> variables can be built
fairly easily, as illustrated
by the following test program, <C>showv.pl</C>:
<CODE>
#!perl -n
++$I;$$_=$.*=$^F^=7for V,X,L,C,D,M;
print"I=$I V=$V X=$X L=$L C=$C D=$D M=$M\n";
</CODE>
Running <CODE>echo hello|perl showv.pl</CODE> displays:
<CODE>
I=1 V=5 X=10 L=50 C=100 D=500 M=1000
</CODE>
So far, so good. However, I could find no remotely competitive solution
using this approach. For example, though this one does work:
<CODE>
#!perl -p
++$I;$$_=$.*=$^F^=7for VXLCDM=~/(.)/g;
s!!($$1<${_&$'}?'-':'+').$$1!eg;$_=eval
</CODE>
it's 87 strokes in length. Ouch.
So I also abandoned this interesting approach early in this game.
This shows how fickle golf can be: a winning approach in the
Fonality game proved to be a complete dud in this very similar one.
</P>
<P>
<B>Update:</B> Much later I found an improved 74 stroke symref-based solution:
<CODE>
$b=++$I;$$_=$b*=$^F^=7for V,X,L,C,D,M;
$\+=$n-$\%$n*2while$n=${+getc};print
</CODE>
</P>
<P>
<blockquote>
<I>
Functional approaches never win
</I>
</blockquote>
</P>
<P>
Much as I enjoy functional programming, this style rarely wins golf tournaments.
For fun, I tried a couple of functional solutions, such as:
<CODE>
use List::Util 'reduce';
print reduce{$a+$b*($b+1<=>$a)}map{/./;$|--&&(M999D499C99L49X9V4I=~$&+$')*y///c}reverse<>=~/((.)\2*)/g
</CODE>
Though there are surely shorter functional-style solutions
(<B>update:</B> see, for example, [id://766382|this one found much later]), I quickly became discouraged
and gave up on this approach quite early.
</P>
<P><B>Into the Top Ten</B></P>
<P>
<blockquote>
<P>
<I>
Of course. The true&tested method of "Can't possibly work, let's try it anyway."
</I>
</P>
<P align="right">
<small>
-- Eugene van der Pijll during <a href="http://perlgolf.sourceforge.net/cgi-bin/PGAS/post_mortem.cgi?id=9">Infix to RPN game</a>
</small>
</P>
</blockquote>
</P>
<P>
This classic Eugene van der Pijll quote is the single best piece of advice
I could give to any aspiring golfer. Remembering Eugene's aphorism helped me time and
time again during this game, especially later when I started golfing in languages
I hardly knew (Python, Ruby and PHP).
</P>
<P>
Reverting to my original 68 stroke solution:
<CODE>
I1V5X10L50C100D500M1e3=~$_,$\-=2*$}x($}<$')-($}=$')for<>=~/./g;print
</CODE>
that <CODE>$}</CODE> "previous value" variable was starting to get <I>really</I> annoying.
It would be nice to get rid of it somehow. But how?
I wonder if it's possible to exploit perl's evaluation order somehow.
Well, follow Eugene's advice and just try it!
<CODE>
$\+=$'-2*$'*(-$'>I1V5X10L50C100D500M1e3=~$_-$')for<>=~/./g;print
</CODE>
Yes! No more <CODE>$}</CODE>. Down to 64 and on the leaderboard now! Woo hoo!
At this point, I allowed myself a few punches in the air with my fist and even
jumped up from my desk and ran madly around the room celebrating like a soccer player
who just kicked a goal.
As for <I>why</I> this solution works, I'll leave that as an exercise
for the keen student of perl internals.
</P>
<P>
The top ten, after four weeks of play:
<CODE>
1st 60 robin Perl
2nd 61 arpad Perl
3rd 63 bearstearns Perl
4th 64 kounoike Perl
5th 64 eyepopslikeamosquito Perl
6th 73 flagitious Ruby
7th 73 yvl Ruby
8th 73 bitsweat Ruby
9th 73 jojo Perl
10th 75 shinh Perl
</CODE>
</P>
<P>
After the euphoria had died down, I realized I was now completely stuck.
</P>
<P><B>What's the rush?</B></P>
<P>
<blockquote>
<P>
<I>
Why is everyone in such a rush?
</I>
</P>
<P align="right">
<small>
-- Peter Norvig in <a href="https://norvig.com/21-days.html">Teach Yourself Programming in Ten Years</a>
</small>
</P>
<P>
<I>
"Learning Perl in 24 Minutes Unleashed, in a Nutshell for Dummies" is the one I have
</I>
</P>
<P align="right">
<small>
-- f00li5h cited at <a href="http://perl.net.au/wiki/Perl_Humour">perl.net.au Humour page</a>
</small>
</P>
<P>
<I>
My head is hurting, my right eye feels like it's going to pop like a mosquito drinking from an expresso addict with high blood pressure, I want to crawl somewhere damp and dark and quiet and I consider never to touch a keyboard again.
</I>
</P>
<P align="right">
<small>
-- `/anick, after hacking all night on the last day of the TPR 02 Perl Golf tournament
</small>
</P>
</blockquote>
</P>
<P>
Then it dawned on me: what's the rush?
With this new form of golf, there's no need to risk golf-induced eye damage,
as `/anick unfortunately suffered,
in a desperate attempt to shave one more stroke before the game closes.
You can simply, Zen-like,
just keep chipping away at your solution at your own pace.
So I just kept whittling away whenever I had some spare time.
</P>
<P>
Some months later, I stumbled upon an unusual <CODE><=></CODE> spaceship operator
solution:
<CODE>
$/=\1;$\+=-$'*(-$'<=>I1V5X10L50C100D500M1e3=~$_-$')for<>;print
</CODE>
Notice that, unlike earlier algorithms,
this one requires one more iteration than the string length.
For this purpose, I utilized the trailing newline, simply changing
<CODE><>=~/./g</CODE> to <CODE><>=~/./sg</CODE> initially, at the cost of a single stroke.
Later, I got that stroke back by noticing that <CODE>$/=\1;</CODE> combined
with <CODE><></CODE> is one stroke shorter.
</P>
<P>
62 strokes! Only three strokes from the lead now, with "bearstearns" the new leader on 59.
</P>
<P>
After further months drifted by, I whittled two more strokes:
<CODE>
$\+=$'-2*$'%(I1V5X10L50C100D500M1e3=~$_*$')for<>=~/./g;print
</CODE>
60 strokes!
I felt a bit embarrassed at not finding this simple trick sooner.
Not to worry, no rush.
Only one from the lead and getting excited now.
</P>
<P>
For some time now I was aware that changing the table lookup string from:
<CODE>
I1V5X10L50C100D500M1e3
</CODE>
to:
<CODE>
M999D499C99L49X9V4I
</CODE>
saved three strokes -- if you could combine it with an expression you had
to perform anyway to add one to <CODE>$'</CODE>
(as was achieved in [id://600680]).
The trouble is, I could not find an economical way to add one to the shortened
lookup string in this game. The best I found is the following 60 stroker:
<CODE>
$\+=$z-2*$z%($z=M999D499C99L49X9V4I=~$_+$')for<>=~/./g;print
</CODE>
</P>
<P>
As indicated at [id://600665], I was also playing around with replacing the
lookup table with a magic formula. At first, I struggled to fit the
magic formulae I had found into a short solution to this course, but eventually
wound up with a flood of 60 strokers, such as:
<CODE>
$\+=$z-2*$z%($z=1 .E.(3^77%ord)%7>>y/VLD//)for<>=~/./g;print
$\+=$z-2*$z%($z=.5**y/VLD//.E.(3^77%ord)%7)for<>=~/./g;print
$\+=$z-2*$z%($z=5**y/VLD//.E.(42^88*ord)%5)for<>=~/./g;print
$\+=$z-2*$z%($z=5**y/VLD//.E.71248%ord()%5)for<>=~/./g;print
$\+=$z-2*$z%($z=5**y/VLD//.E.(3&57532/ord))for<>=~/./g;print
$\+=$z-2*$z%($z=.1.E.(3^85%ord)%7>>y/VLD//)for<>=~/./g;print
</CODE>
I now had many different 60 stroke solutions, but the lead was at 59!
</P>
<P><B>Never give up</B></P>
<P>
<blockquote>
<P>
<I>
But it's not over then ... I pretty soon found both the basic 33-solutions by permuting ways to do that a bit ... In fact, there's an almost 32 solution ... which fails on a stderr message and does not work on arguments that start with a number and contain a letter. But it has to be tried at least. How many of the 33 people did?
</I>
</P>
<P align="right">
<small>
-- Golfing legend Ton Hospel <a href="http://www.mail-archive.com/fwp@perl.org/msg01640.html">showing</a> why he's No. 1; he's always searching for improvements
</small>
</blockquote>
</P>
<P>
Alas, I couldn't improve any of these solutions and I've still got no idea
what the leading golfers on 59 were doing.
</P>
<P>
In utter frustration, I took off for a long two hour walk to clear my head.
After about an hour of walking, in desperation, I tried a completely different approach.
I remember walking along calculating its golf score in my head, again and again, and
kept coming up with 58!
Surely, it couldn't be. Usually, when I have these "epiphanies", there's something
I've overlooked and the solution fails dismally when I actually test it.
With trembling fingers, I typed in the 58 stroke solution:
<CODE>
$\+=$z-2*$z%($z=VLD=~$_*5+IXCM=~$_."E@-")for<>=~/./g;print
</CODE>
1, 2, 3, ... 99, 100, ... 3999. Success, all tests pass! Time for another celebration.
</P>
<P>
For the first time in my life, I'm leading a Perl golf tournament!
This solution, requiring no magic formulae at all, is not that hard to find and
I remember feeling quite surprised at the time that noone else had found it
after more than a year of play.
</P>
<P>
<blockquote>
<I>
Magic formulae always win (update to Mtv's law)
</I>
</blockquote>
</P>
<P>
When I started writing Python, Ruby and PHP solutions to this problem,
I fell back to the magic formula approach, mainly due to my ignorance
of general golfing tricks in these languages. Curiously, I was then able to apply
the tricks learnt there to improve the Perl magic formula based solutions
and so beat the 58 stroke regex-based solution described above.
</P>
<P>
In the next installment, I'll describe how I golfed this problem in
Python and Ruby.
</P>
<P><B>References</B></P>
<P>
<ul>
<li> [id://761053]
<li> [id://762180]
<li> [id://763105]
<li> [id://811919]
<li> [id://814900]
<li><a href="http://codegolf.com/">Golf competitions in Perl, Ruby, Python or PHP</a>
<li> <a href="http://phpgolf.org/">Golf competitions in PHP</a>
<li><a href="http://perlgolf.sourceforge.net/">TPR Golf Contests</a>
<li><a href="http://terje2.frox25.no-ip.org/~golf-info/Book.html">Terje/mtv pdf book about Perl Golf</a>
<li> [id://903641]
<li> [id://594299]
<li><a href="http://groups.google.com/group/pl.comp.lang.perl/browse_frm/thread/6e0afdfe37ec51be/dbf17d1a9c9adee0?hl=en&fwc=1">Original Polish Golf where Ton first used his magic formula</a>
<li> [id://600665]
<li> [id://592386]
</ul>
</P>
<P><B>References Added Later</B></P>
<P>
<ul>
<li> <a href="http://golf.shinh.org/p.rb?Roman+numeral">Roman numeral golf at golf.shinh.org</a>
<li> [id://995190]
<li> [id://1083046] (describes HPC techniques for finding golfic magic formulae)
<li> <a href="https://code.golf/roman-to-arabic">code.golf</a> (roman to arabic)
<li> [id://1184330]
</ul>
</P>
<P>
<small>
Updated 26-apr 2009: Minor improvements to wording. March 2016: Added Larry Wall quote re Huffman coding.
</small>
</P>
</readmore>