perl_user_52 has asked for the wisdom of the Perl Monks concerning the following question:

Hello. Long-time reader, first time posted. I apologise in advance for the proceeding wall of text and understand that, this in itself has the potential reduce the propensity for people to read/respond or just type TLDR,etc... nb: couldnt figure out how to \n while writing this I had a question regarding whether or not i should be investing time into advancing my perl skills to perform certain tasks or if a language such as C or C# would be more appropriate (if a similar question has been answered before or if anyone can provide reference to an answer feel free to post the link) Background: My qualifications are in mathematics and anything i know about computer science is self-taught on a ‘needs basis’. I work in an area where we play around with data and learnt perl because clients kept sending us data in formats that were unusable and it was suggested that perl would be a good tool for data cleaning. Once cleaned the data would be loaded into SQL Server. I consider myself a novice in perl and an advanced SQL server user. I am writing this because SQL server is starting to frustrate me with regards to how long it takes to execute complex PROCs on large data sets and also with regards to the limitations of the language itself. For instance, if I could use the regex capabilities in perl instead of SQL queries such as LIKE, i believe this could add significant value to the analysis i am trying to do. here is the problem: i do not have a significant computer science background and was wondering if perl is an appropriate language to pursue further or should i start building up some C or C# skills - i say C because i eventually want to learn objective C and write business apps (maybe for the Iphone/Ipad) and i think perl is not a good tool for this (but I don’t have a good enough background in programming to say this for certain). From what I understand both perl and C/C# can be used to query sql server and then once you have the data then you can start manipulating it using the language of perl/c/c# . As such, on the basis on the information provided are there any suggestions as to which language would be most appropriate to invest time into learning (in terms power (what it can do), speed (how fast it can do it), and also ease of use (i.e. can be learnt by searching google) when it comes to connecting to SQL Server and then messing around with the data. And if I really want to get into writing business apps (I’m not sure if I will go down the iphone/ipad path) it worth just learning something other than perl because it can manipulate data and also create these apps in the same language.
  • Comment on data manipulation / sql server / business applications

Replies are listed 'Best First'.
Re: data manipulation / sql server / business applications
by martell (Hermit) on May 01, 2011 at 09:40 UTC

    Hello,

    I think you are asking several questions at once. Now I cannot comment on the performance differences between perl C/C# because I'm a long time Perl user. Never bothered about C/C# because the proverb "The fastest route to you goal, is the route you're known"

    Is perl suitable for data cleansing?

    Perl is perfect for pattern matching and extracting data from one source and writing it to another. So if your cleansing consist out of extracting data from a file and rework it, while knowing where/what to look for, you'll be fine with perl. Never had a situation where I couldn't find a perl module that would give me access to a certain data source (excel, xml, txt, csv, databases, ....). From there it is simply a matter of coding and learning pattern matching. The pattern matching skills are never wasted because all languages have almost similar concepts.

    The power of perl for this kind of tasks lays in the fact that perl can be used from very simple scripts till quite complex scripts without you forcing to program in a certain paradigm. It seems cliché, but believe me, this is a real advantage.

    Perl is not the perfect choice if you are combining many different data sources on a more continuous basis to construct one record from many records. An ETL tool is the way to go for that (for example Pentaho is a nice solution that provides an free version). The programming effort to access different data sources and combine the records is to big. But that is the same for other languages. However pattern matching is often weak in those tools. If found myself often using an ETL tool and perl in 2 or 3 steps.

    Perl is neither very strong in doing fuzzy matching problems aka "Does this looks similar with certain confidence level?". This kind of questions are difficult to answer and perl doesn't have much in the toolbox for this kind of questions.

    I eventually want to learn objective C

    Sorry, different goal, different answer. If you want to combine learning objective C and datacleansing, objective C is the way to go.

    Kind regards

    Martell

Re: data manipulation / sql server / business applications
by roboticus (Chancellor) on May 01, 2011 at 12:37 UTC

    perl_user_52:

    I use C, C++, C# and perl regularly. For the things you're talking about, I reach for perl first. It's great for cleaning data, it's easy to hook to databases, and it's the best language (IMO) for knocking together things quickly.

    As far as writing apps for the iPhone, I don't have one, so I have no opinion on that.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: data manipulation / sql server / business applications
by ww (Archbishop) on May 01, 2011 at 15:02 UTC

    Downvoted, not for content (even though it is made longer by discursive asides), but for failure to read the directions (aka RTFM) which accompany the text-input box:

    Posts are HTML formatted. Put <p> </p> tags around your paragraphs.

    Worse, the preview page, which, as a new user you were required to visit, adds, above the text input box, this:

    If something looked unlike you expected it to you might need to check out Writeup Formatting Tips

    Alternately, you can solve your problem with "\n" by reading Markup in the Monastery.

    Your "wall of text" is almost as unreadable as a wall of code.
Re: data manipulation / sql server / business applications
by locked_user sundialsvc4 (Abbot) on May 01, 2011 at 16:14 UTC

    In my 30+ years in this business, I consider programming languages to be tools, and I’m always interested in finding a new one that approaches some useful set of problems in some new and useful way.   It’s very important what your professional tool-box will contain, and it will never consist of only one tool.   But I daresay that you will find, as so many others have already found, that Perl will become “a well-worn tool, picked-up again and again and again.”   I myself was a latecomer to Perl, and I regret that.   It’s not the only tool that I use, but Perl has certainly become a hands-down favorite for tasks big and small.   (Especially those “small” tasks that become big.)

    The reason is that Perl is a practical, pragmatic language that was born from a well-felt need to do a time-consuming task better than any other tools could do at the time.   It is also well-supported by the CPAN library (http://search.cpan.org) which, at this particular moment in time, has “67,491 Uploads; 22,520 Distributions; 92,835 Modules; contributed by 8,929 Uploaders” (including a great many people that you will find on a very regular basis right here).   This is an extremely impressive set of tools indeed, all of them well-tested and free, and generally cross-platform so that you can move from one type of hardware to something altogether different, and still be able to use the same tool in the same way.   (In the words of the credit-card commercials, “Priceless.™”)

    In short:   invest the time to become familiar with this language, and with this web-site.   Your effort will be promptly and richly rewarded.   There is indeed “a practical, pragmatic reason” for “what all the fuss has been about, for such a very long time now.”

    P.S.:   As for what will be the tool(s) that are ultimately used in the pad-computing space, I believe that it is really a bit too early to tell yet.   This is just the most-recent game changer to surface, and at this point developers are being invited to write platform-specific applications in platform-specific tools (e.g. Objective-C).   I know from experience that this won’t be the way that the game ends, because what developers ultimately need is cross-platform capability.   But it is not yet clear to me which tool(s) will dominate.   What I do know, however, is that “one size does not fit all,” and that it can be a mistake either in the short run or the long run to try to “wedge” a tool into a particular hole.   There will always be many tools in your box, and the ability to work fluently with many tools is always going to be a skill that is prized.   If you find yourself waiting on the sidelines wondering “which one will win,” you might be Waiting for Godot.   Keep your head up and your eyes wide open, and always make it a point to at least sample whatever’s on the new dishes as they pass by.   You might not care to eat them, but at least know what they taste like.