SAS alternatives:
GNU R, like S+ - a dedicated statistical language similar in
purpose to SAS, but intentionally similiar to S, another
statistical language more popular in academia rather than
business. Most people like the esthetics of S better than
SAS, but SAS is very widely ported, marketed, and supported,
and is exceedingly complete.
SPSS and Minitab are similiar in purpose to SAS as well,
but I know little about them so won't say anything. Search
Google.
this page at stat.cwru.edu has an excellent index of statistics software related sites.
What SAS is:
To Perlers who know little about SAS but seem to need to
have some opinion of it, SAS talks to many databases,
presents to the user an extended dialect of SQL enhanced
for statistical tasks, and provides a massive library of statistical
and data processing functions. It imports data from arbitrary formats - part of the language is specifying
input format and it is effective as a parser - and it works
across databases and flat files of arbitrary format.
It works with extremely large data sets efficiently.
No special syntax or logic is needed to work with datasets
far larger than would fit into virtual memory, though
many operations require a large amount of temporary space.
The companies I've worked for or know people who work at
often process terabytes a day. One place, in the medical
infromation industry, has thousands of people on staff
doing statistics on data, a good chunk of them using SAS.
SAS is used for anything related to statistics - sometimes
marketing, but also drug interaction research, stock
market speculation, financial planning, insurance
(a major purchaser of supercomputers), and numerous
other things. It tends to be used by people who are
statisticians but not neccisarily programmers. SAS is very
old and very mature. It was originally written in FORTRAN,
and lived for a long period of time as a mix of FORTRAN and
C, though I'm told the FORTRAN parts have been rewritten.
Like a lot of old software, it is very reliable and
of very high quality, and has evolved a lot through
continued pressure, though may not be very consistent.
Other statistics programs have moved in on SAS lately -
web based applications that push down buggy ActiveX
plugins, and attempt to graft "5GL" logic onto the process,
making design visual. They are extremely limited,
extremely buggy, and dumb. They try to do queries for
you, but screw it up, so you have to muck through their
busted SQL trying to fix it, without the option of
rewritting it, as it would no longer be able to understand
the SQL and would then no longer be visual. Microstrategy
is an example. Its output looks pretty, and it does
simple things easily, but man... I'm just trying to
put what SAS is into perspective.
To actually answer your question - is Perl is a viable
alternative to SAS - I'd say "no". Perl could not replace
SAS. They are too different, SAS is only marginally a
language but is primarily a library of integrated
routines with a lot of backend, and SAS is very good at
what it does. Perl couldn't replace SAS. If you wanted to
know if Perl could replace SAS
for your particular
application of SAS, that is an entirely different
question, and it depends on what you're doing with SAS.
Very likely you're using only a small portion of SAS,
making it much easier. Still, if you're employing non-programmer statisticians, they won't be comfortable
with Perl. Better use R (still far less complete, but
atleast specialized). If you yourself have some basic
statistical things that you want to do and you're
able to program in a "traditional" C-like programming
language, you'll find yourself writing a lot more Perl
than you would SAS to do the same job, but PDL (Perl Data Language), PDL::R (some R functions for PDL), and lots of things under Math:: in CPAN will go a long ways. You'll
need a database - no bones about it - and it will need
to do subqueries.
If you're just learning statistics: You can go the
tranditional way, and buy a book on statistics and a
calculator with statistical functions (or equivilent
software), in which case you're exposed to performing
the functions and no so concerned with processing
large amounts of data. If you just have a lot of data
to process, you probably don't need statistics at all -
a good database application will do you. Somewhere in the
middle, a lot of statistics tasks are very common:
finding products that sell well together and should be
co-promoted, or optimizing variables (number of flights
an airline should make between two cities in a month,
price to market a product at), computing customer churn
and optimizing customer service for maxiumum profit
(minimum churn, minumum cost). You wait on hold for an
hour before you can get a rep and you think the company
is just really busy? It is all completely intentional.
They know exactly how much customer service costs and they
know how much business they will lose when they provide
different levels and they've intenionally picked exactly
that level of service. Most people have no idea what
a prominate role statitics play in their consumer
experience...
Anyway, I hope this background and these pointers help
with whatever you're trying to do. If you expound on
what you're trying to do, someone will probably be able
to give a less broad, more helpful tutorial.
-scott