Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: SAS vs Perl?

by scrottie (Scribe)
on Aug 14, 2003 at 15:41 UTC ( [id://283914]=note: print w/replies, xml ) Need Help??


in reply to SAS vs Perl?

SAS alternatives:

GNU R, like S+ - a dedicated statistical language similar in purpose to SAS, but intentionally similiar to S, another statistical language more popular in academia rather than business. Most people like the esthetics of S better than SAS, but SAS is very widely ported, marketed, and supported, and is exceedingly complete.

SPSS and Minitab are similiar in purpose to SAS as well, but I know little about them so won't say anything. Search Google.

this page at stat.cwru.edu has an excellent index of statistics software related sites.

What SAS is:

To Perlers who know little about SAS but seem to need to have some opinion of it, SAS talks to many databases, presents to the user an extended dialect of SQL enhanced for statistical tasks, and provides a massive library of statistical and data processing functions. It imports data from arbitrary formats - part of the language is specifying input format and it is effective as a parser - and it works across databases and flat files of arbitrary format. It works with extremely large data sets efficiently. No special syntax or logic is needed to work with datasets far larger than would fit into virtual memory, though many operations require a large amount of temporary space. The companies I've worked for or know people who work at often process terabytes a day. One place, in the medical infromation industry, has thousands of people on staff doing statistics on data, a good chunk of them using SAS. SAS is used for anything related to statistics - sometimes marketing, but also drug interaction research, stock market speculation, financial planning, insurance (a major purchaser of supercomputers), and numerous other things. It tends to be used by people who are statisticians but not neccisarily programmers. SAS is very old and very mature. It was originally written in FORTRAN, and lived for a long period of time as a mix of FORTRAN and C, though I'm told the FORTRAN parts have been rewritten. Like a lot of old software, it is very reliable and of very high quality, and has evolved a lot through continued pressure, though may not be very consistent. Other statistics programs have moved in on SAS lately - web based applications that push down buggy ActiveX plugins, and attempt to graft "5GL" logic onto the process, making design visual. They are extremely limited, extremely buggy, and dumb. They try to do queries for you, but screw it up, so you have to muck through their busted SQL trying to fix it, without the option of rewritting it, as it would no longer be able to understand the SQL and would then no longer be visual. Microstrategy is an example. Its output looks pretty, and it does simple things easily, but man... I'm just trying to put what SAS is into perspective.

To actually answer your question - is Perl is a viable alternative to SAS - I'd say "no". Perl could not replace SAS. They are too different, SAS is only marginally a language but is primarily a library of integrated routines with a lot of backend, and SAS is very good at what it does. Perl couldn't replace SAS. If you wanted to know if Perl could replace SAS for your particular application of SAS, that is an entirely different question, and it depends on what you're doing with SAS. Very likely you're using only a small portion of SAS, making it much easier. Still, if you're employing non-programmer statisticians, they won't be comfortable with Perl. Better use R (still far less complete, but atleast specialized). If you yourself have some basic statistical things that you want to do and you're able to program in a "traditional" C-like programming language, you'll find yourself writing a lot more Perl than you would SAS to do the same job, but PDL (Perl Data Language), PDL::R (some R functions for PDL), and lots of things under Math:: in CPAN will go a long ways. You'll need a database - no bones about it - and it will need to do subqueries.

If you're just learning statistics: You can go the tranditional way, and buy a book on statistics and a calculator with statistical functions (or equivilent software), in which case you're exposed to performing the functions and no so concerned with processing large amounts of data. If you just have a lot of data to process, you probably don't need statistics at all - a good database application will do you. Somewhere in the middle, a lot of statistics tasks are very common: finding products that sell well together and should be co-promoted, or optimizing variables (number of flights an airline should make between two cities in a month, price to market a product at), computing customer churn and optimizing customer service for maxiumum profit (minimum churn, minumum cost). You wait on hold for an hour before you can get a rep and you think the company is just really busy? It is all completely intentional. They know exactly how much customer service costs and they know how much business they will lose when they provide different levels and they've intenionally picked exactly that level of service. Most people have no idea what a prominate role statitics play in their consumer experience...

Anyway, I hope this background and these pointers help with whatever you're trying to do. If you expound on what you're trying to do, someone will probably be able to give a less broad, more helpful tutorial.

-scott

Replies are listed 'Best First'.
Re: Re: SAS vs Perl?
by BazB (Priest) on Aug 14, 2003 at 21:31 UTC

    scrottie++.

    gunglichen, you don't explain what your application is and how you currently use SAS, so it's hard for me to make specific comments, but I'm going to drivel on anyway :-)

    As well as Perl, I use SAS at work - it's used as the basis of our (multi-terabyte) datawarehouse, which I personally think is pretty horrific, and for marketing/customer research analysis on substantial amounts of data where it really shines.

    I'm not really a fan of SAS - I generally don't use it as a statistical package, but as a datawarehouse/datastorage system.
    Statisticians, analysts and pharmaceutical users seem to be the type of folk that will get the most from SAS - it's where SAS grew up, and it shows. There seems to be a push for SAS to move into the database/warehouse area, but I'm not too impressed.

    The basic SAS does not handle parallel processing, concurrency, transactions and the like that you'd expect from an RDBMS.
    There are additional SAS packages that help, but I'd rather use an RDBMS.

    If you want to calculate regressions, aggregations, perform summarisations, and more analytical functions that I understand, SAS is one of those bits of kit that'll do the job.

    Perl is neither an RDBMS nor a complicated statistics package.
    It's a case of the right tool for the job.
    Some of the SAS programmers I work with try and do everything in SAS, and it gets nasty quickly.
    My colleagues are getting bored of me telling them they should be using Perl for that ;-)

    SAS does have text processing capabilities, but I almost always extract the data, munge it using Perl, or stick in through some other bespoke software.

    If you're using SAS to perform transforms, data storage, basic reporting, comparisons etc, maybe Perl plus some kind of database (from CSV to RDBMS depending on requirements) would do the job.
    If you're storing, retrieving and querying data, use an RDBMS (and maybe Perl as a glue language).
    If you want hardcore statistics, use SAS.

    There is no way you could hope to replicate all of the statistical and reporting functionality or even a respectable subset that SAS offers using Perl.

    Cheers.

    BazB


    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://283914]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-04-18 06:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found