perlmeditation
chunlou
<p>Every now and then, someone asked about Perl and math. Sometimes some Perl
users asked how they could use Perl to do more math stuff, or folks from other
fields asked how Perl might complement their math work. Here we'll represent
a few handy references, comparisons and illustrations on a few math modules and
software useful to Perl. It serves more as a brief overview of sample of
math modules and software, nothing comprehensive or exhaustive. One glaring
omission is bioperl and bioinformatics, which is too diverse and too huge a
field to be considered as just "math."</p>
<readmore>
<p>Normally, if you search <a href="http://www.cpan.org">CPAN</a> for Perl math
modules. You probably should look under Math::* or Statistics::*, along with AI::*,
Algorithm::*, Crypt::*, Date::*,
Graph::*, GraphViz::*, Inline::*, etc. The <a href="http://www.gnu.org/">GNU
Project</a> is also a good place to look for external software useable by Perl.</p>
<h2>Math Modules & Software</h2>
<p>Here is a sample of modules and software useful to do math with Perl.</p>
<table border="1" cellpadding="3" cellspacing="0">
<tr>
<td align="center"><font size="2"><b>Module/Software</b></font></td>
<td align="center"><font size="2"><b>Type</b></font></td>
<td align="center"><font size="2"><b>Availability</b></font></td>
<td align="center"><font size="2"><b>Remarks</b></font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Inline::C]</font></td>
<td><font size="2">General Programming. Gateway to other C stuff</font></td>
<td><font size="2">Any OS</font></td>
<td><font size="2">Probably not your first choice to do math in Perl</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Inline::Octave] & <a href="http://www.octave.org/">Octave</a></font></td>
<td><font size="2">Matrix algebra and numerical analysis</font></td>
<td><font size="2">Inline::Octave not avail. from ActiveState. Octave avail. for any OS</font></td>
<td><font size="2">Open source version of Matlab. Fast. Simple syntax.
Octave can be used interactively or as a scripting language. Other commercial
counterparts: Gauss, APL, etc.</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Math::Cephes]</font></td>
<td><font size="2">General engineering math library</font></td>
<td><font size="2">Any OS</font></td>
<td><font size="2">Eigensystem only for real symmetric matrices</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Math::LP]</font></td>
<td><font size="2">Linear programming</font></td>
<td><font size="2">Not avail. for ActiveState. DOS version LP Solve avail. (somewhere)</font></td>
<td><font size="2">Free. Not sure if LP Solve itself is still actively
maintained. Commercial counterparts: CPLEX, Lindo, Minos, AMPL, etc.</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Math::Pari]</font></td>
<td><font size="2">Number theory</font></td>
<td><font size="2">Any OS</font></td>
<td><font size="2">Good for cryptographic analysis</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://Math::MatrixReal]</font></td>
<td><font size="2">Matrix algebra (for real values only)</font></td>
<td><font size="2">Any OS</font></td>
<td><font size="2">Pure Perl implementation. Eigensystem only for real symmetric matrices</font></td>
</tr>
<tr>
<td align="center"><font size="2">[cpan://PDL]</font></td>
<td><font size="2">Perl Data Language</font></td>
<td><font size="2">2.4.0 latest. 2.3.1 binary for Windows</font></td>
<td><font size="2">Gives Perl more math-oriented syntax</font></td>
</tr>
<tr>
<td align="center"><a href="http://www.perlmonks.org/www.r-project.org"><font size="2">R</font></a></td>
<td><font size="2">Statistical software</font></td>
<td><font size="2">Any OS</font></td>
<td><font size="2">Used interactively or as a scripting language. Very versatile for
plotting (you can see sample plots in this <a href="http://www.ipd.uka.de/EIR/otherwork/jccpprt_computer2000.pdf">report</a>). Latest version 1.7.1. Open source version of Splus. Bi-directional interface for
*nix & COM Server interface for Windows (for R v.1.7.0) avail. from <a href="http://www.perlmonks.org/www.omegahat.org">Omegahat</a>.
Other commerical counterparts: SPSS, SAS, etc.</font></td>
</tr>
</table>
<h2>Syntax Comparison</h2>
<p>It may be informative to compare the mathematical syntax between modules and
software. The following compares the syntax to multiple a 2-by-2 matrix with a
vector.</p>
<table border="1" cellpadding="3" cellspacing="0">
<tr>
<td align="center"><font size="2"><b>Module/Software</b></font></td>
<td align="center"><font size="2"><b>Code</b></font></td>
</tr>
<tr>
<td align="center"><font size="2">Math::Cephes</font></td>
<td><code>
use Math::Cephes::Matrix qw(mat);
$a = mat [[1, 2], [3, 4]];
$b = [1, 1];
print "@{$a->mul($b)}";
# print "3 7"
</code></td>
</tr>
<tr>
<td align="center"><font size="2">Math::RealMatrix</font></td>
<td>
<code>
use Math::MatrixReal;
$a = Math::MatrixReal->new_from_rows( [[1, 2], [3, 4]] );
$b = Math::MatrixReal->new_from_cols( [ [1, 1] ] );
print $a*$b;
# print "[ 3.000000000000E+000 ]
# [ 7.000000000000E+000 ]"
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">PDL</font></td>
<td>
<code>
use PDL;
$a = pdl [[1, 2],[3, 4]];
$b = pdl [[1], [1]];
print $a x $b;
# print "[
# [3]
# [7]
# ]"
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">Octave</font></td>
<td>
<code>
>> a = [1, 2; 3, 4];
>> b = [1; 1];
>> a*b
ans =
3
7
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">R</font></td>
<td>
<code>
> a = matrix(c(1,2,3,4), ncol=2, byrow=T)
> b = matrix(c(1,1), ncol=1)
> a%*%b
[,1]
[1,] 3
[2,] 7
</code>
</td>
</tr>
</table>
<p>On the math alone, the syntax of all the modules and software above seems
fairly clean. Octave and R have much easier to read overall language syntax
compared to Perl, however.</p>
<p>A distinguished feature of many math-oriented languages are their "vectorized
operation" and "subscripting" capability. Take extracting a sub
matrix as an example:</p>
<table border="1" cellpadding="3" cellspacing="0">
<tr>
<td align="center"><font size="2"><b>Module/Software</b></font></td>
<td align="center"><font size="2"><b>Code</b></font></td>
</tr>
<tr>
<td align="center"><font size="2">Math::Cephes</font></td>
<td><code>
use Math::Cephes::Matrix qw(mat);
$mat = mat [[1, 2, 3], [4, 5, 6], [7, 8, 9]];
$mat = $mat->coef;
for my $i (1..2) { # 0 first index
print "@{$mat->[$i]}[1..2]\n";
}
# print "5 6
# 8 9"
</code></td>
</tr>
<tr>
<td align="center"><font size="2">Math::RealMatrix</font></td>
<td>
<code>
use Math::MatrixReal;
$mat = Math::MatrixReal->new_from_rows(
[[1, 2, 3], [4, 5, 6], [7, 8, 9]] );
for my $i (2..3) { # 1 first index
for my $j (2..3) {
print $mat->element($i, $j), " ";
}
print "\n";
}
# print "5 6
# 8 9"
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">PDL</font></td>
<td>
<code>
use PDL;
$mat = pdl [[1, 2, 3], [4, 5, 6], [7, 8, 9]];
print $mat->slice("1:2,1:2"); # 0 first index
# print "[
# [5 6]
# [8 9]
# ]"
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">Octave</font></td>
<td>
<code>
mat = [1,2,3; 4,5,6; 7,8,9];
mat(2:3, 2:3)
ans =
5 6
8 9
</code>
</td>
</tr>
<tr>
<td align="center"><font size="2">R</font></td>
<td>
<code>
> mat = matrix(1:9, ncol=3, byrow=T)
> mat[2:3, 2:3]
[,1] [,2]
[1,] 5 6
[2,] 8 9
</code>
</td>
</tr>
</table>
<p>Vectorized operation is a very convenient feature. Consider the following R's code:</p>
<code>
> vec = 1:10
> vec
[1] 1 2 3 4 5 6 7 8 9 10
> vec %% 2
[1] 1 0 1 0 1 0 1 0 1 0
> vec[vec %% 2 == 1]
[1] 1 3 5 7 9
> vec[vec %% 2 == 1] + 1
[1] 2 4 6 8 10
</code>
<p>Since Pari was mentioned and it's somewhat a class of its own, here's an example of Pari in its own context:</p>
<code>
#! /usr/local/bin/perl -w
use strict;
# -----------------------------------------------------------
# RSA algorithm -- assymetrical\public-key cryptography
# -----------------------------------------------------------
use Math::Pari qw(gcd PARI) ;
# -----------------------------------------------------------
# m -- message
my $m = 'Perl' ;
print "original: $m\n" ;
my $tmpl = 'C*' ;
my @m = unpack($tmpl, $m) ; # string -> unsigned char values
print "coded: @m\n" ;
# n = pq -- in RSA, p & q = prime, each 1024 bits/308 digits long
my $p = PARI("prime(".int(rand 50).")") ;
my $q = PARI("prime(".int(rand 50).")") ;
my $n = $p*$q ; # $n = Pari's obj
# choose a random number r, s.t.
# 1 < r < (p-1)(q-1) = b
# gcd(r, b) = 1 -- relative prime
my $b = ($p-1)*($q-1) ;
my $r ;
do {$r = int rand $b ; } until (gcd($r,$b) == 1) ;
$r = PARI $r ;
# rk = 1 mod (p-1)(q-1) -- k = private key; (n, r) public
my $k = (1/$r)%$b ; # Pari's math operators, since vars = Pari
# encrypt -- c = (m ^ r) mod n
my @c ;
map { $c[$_] = ($m[$_]**$r)%$n } 0..$#m ; # Perl: ** for power
print "ciphered: @c\n" ;
# decrypt -- m = (c ^ k) mod n
my @d ;
map { $d[$_] = PARI("($c[$_]^$k)%$n") } 0..$#c ; # Pari: ^ for power
print "deciphered: @d\n" ;
print "decoded: " . pack($tmpl, @d) . "\n" ;
__END__
original: Perl
coded: 80 101 114 108
ciphered: 18431 6512 5843 7236
deciphered: 80 101 114 108
decoded: Perl
</code>
<p>Sometimes if somehow you can't or don't have direct mechanism to interface
between Perl and your external math software but your software can be operated
through command line, you can always try <b>interfacing through command line</b>.
Here's an example for Perl and R.</p>
<code>
#! /usr/local/bin/perl -w
use strict ;
R("getwd()");
sub R {
my $Rpath = "C:\\R\\rw\\bin\\" ;
my $Rcmd = $Rpath . "rterm --vanilla --quiet --slave" ;
my $Rscript = shift ;
$Rscript =~ s/(\r|;\r)/ ;/gm ;
$Rscript =~ s/<-/=/gm ; # \r or <- will break "echo"
return `echo $Rscript | $Rcmd` ;
}
</code>
<p>Or if you only have the DOS LP Solve executable, you can do:</p>
<code>
my $dir = 'D:\\tmp\\prog\\math\\lp32';
my $lp_solve = "d:\\mydir\\lp_solve.exe";
my $lpfile = "d:\\mydir\\model.lp";
(my $lp = << " EOF") =~ s/^\s+//gm;
min: 8 x1 + 15 x2 ;
c1: 10 x1 + 21 x2 > 156 ;
c2: 2 x1 + x2 > 22 ;
EOF
open OUTFILE, "+>$lpfile";
print OUTFILE $lp;
close OUTFILE;
$output = qx($lp_solve < $lpfile);
$output =~ s/\r//gm;
print $lp;
print $output;
system("del $lpfile");
</code>
<p>That's why Perl is called glue language.</p>
<h2>Benchmarks</h2>
<p>When comes to math, people often care about speed. So, let's benchmark. </p>
<code>
use strict;
use warnings;
my ($a1,$a2,$a3,$ref);
for my $x (0..20) {
for my $y (0..20) {
$ref->[$x][$y] = rand 100;
}
}
use Math::Cephes::Matrix qw(mat);
$a1 = mat $ref;
use Math::MatrixReal;
$a2 = Math::MatrixReal->new_from_rows( $ref );
use PDL;
use PDL::Slatec;
$a3 = pdl $ref;
use Benchmark qw(cmpthese);
cmpthese(100,
{
cephes=>sub{$a1->inv()},
matrixreal=>sub{$a2->inverse()},
pdl=>sub{matinv($a3)}
}
);
__END__
Rate matrixreal cephes pdl
matrixreal 5.28/s -- -94% -99%
cephes 83.3/s 1479% -- -87%
pdl 625/s 11744% 650% --
</code>
<p>Notice for any math modules using compiled code, your mileage can be greatly
affected by how you compile your module.</p>
<p><a href="http://www.sciviews.org/other/benchmark.htm">Here</a> you can find benchmark
for Octave and R along with other math packages.</p>
<h2>Sample Usage Scenarios</h2>
<p>In case you're not sure how Perl and other math modules and software are
supposed to be used together in a project, here are a couple of generic
illustrations (not rules).</p>
<table border="1" cellpadding="3" cellspacing="0">
<tr>
<td> </td>
<td align="center"><b>Perl</b></td>
<td align="center"><b>Math Module / Software</b></td>
<td align="center"><b>Remarks</b></td>
</tr>
<tr>
<td>Data Storage</td>
<td>
<p align="center"><font size="1" color="#808080">N/A</font></td>
<td align="center"><font size="1" color="#808080">N/A</font></td>
<td align="left">
<p align="left"><font size="2">Many people store their data in
spreadsheet. Better way will be to store data in database, such as Oracle,
MySQL, MS SQL, etc. This part doesn't have much to do with Perl and the
math module and software per se.</font></td>
</tr>
<tr>
<td>Data Access</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="left"><font size="2"> This part means "insert,"
"update," and "delete" data. You will probably be
using [cpan://DBI] for that. If your math software can do it directly and
you don't need to do much Data Manipulation stuff below, you can possibly
leave Perl out of the picture here.</font></td>
</tr>
<tr>
<td>Data Manipulation</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="left"><font size="2"> Data manipulation means getting the
raw data into the right format for Data Analysis, such as extracting a
series of dates from a log file, turning some poorly formatted text file
into a CSV file, etc. When the task gets tricky, Perl is often a better
choice than doing it in a spreadsheet or a math software. That also means
you will probably be using a lot of <b>Regular Expressions</b>.</font></td>
</tr>
<tr>
<td>Data Analysis</td>
<td><font size="2" color="#808080">Maybe</font></td>
<td align="center"><u><b>YES</b></u></td>
<td align="left"><font size="2"> Perl doesn't have the speed compared
to C or clean syntax compared to Octave or extensive math modules to do
heavy duty math. Data analysis is often best handled by external
specialized software. But of course, if you use Perl for everything else
and the math stuff is relatively simple, "outsourcing" the math
work could be overkill and counterproductive.</font></td>
</tr>
<tr>
<td>Reporting</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="left"><font size="2"> Some database vendor (such as Oracle)
or math/statistical software (SAS, for instance) has "report
generator." If such a report generator can readily provide what you
need, no reasons why you shouldn't use it. But when heavy twisting is
needed to get what you want out of a report generator, Perl may be a
viable alternative. </font>
<p><font size="2">Often you can use Perl to create a report in HTML (which
is usually easier than, say, PDF) and convert the HTML to whatever other
format you want (such as Word or PDF).</font></p>
<p><font size="2">Some people may ask what about XML. Well, XML is for
data representation, not visual representation. In terms of development
and design process, XML is probably a consideration closer to Data Storage
than Reporting, i.e. something you should consider and design much earlier
on before you think about reporting.</font></p>
</td>
</tr>
</table>
<p>An increasing common scenario is where people fetch their data (in form of
XML, let's say) over the Internet or some network and then analyze the data upon
download (stock market or DNA data, for example).</p>
<table border="1" cellpadding="3" cellspacing="0">
<tr>
<td> </td>
<td align="center"><b>Perl</b></td>
<td align="center"><b>Math Module / Software</b></td>
<td align="center"><b>What <i>may</i> be used<br/> </b><font size="1">(merely
listing possibilities, not recommendation)</font></td>
</tr>
<tr>
<td>XML Fetch</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="center">
<p align="center"> [cpan://LWP], [cpan://XML::libXML]</td>
</tr>
<tr>
<td>Data Manipulation</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="center"> Regular expressions</td>
</tr>
<tr>
<td>Data Analysis</td>
<td><font size="2" color="#808080">Maybe</font></td>
<td align="center"><u><b>YES</b></u></td>
<td align="center"> Modules/software aforementioned</td>
</tr>
<tr>
<td>Reporting</td>
<td><u><b>YES</b></u></td>
<td align="center"><font size="2" color="#808080">Maybe</font></td>
<td align="center">[cpan:// Text::Template], [cpan://HTML::Template], [cpan://XML::libXSLT]</td>
</tr>
</table>
<p>That's all for now. Hope it helps.</p>
</readmore>
<p>
<br>
__________________<br>
Update: Additional example added to Syntax Comparison.
</p>