in reply to data manipulation / sql server / business applications
Hello,
I think you are asking several questions at once. Now I cannot comment on the performance differences between perl C/C# because I'm a long time Perl user. Never bothered about C/C# because the proverb "The fastest route to you goal, is the route you're known"
Is perl suitable for data cleansing?
Perl is perfect for pattern matching and extracting data from one source and writing it to another. So if your cleansing consist out of extracting data from a file and rework it, while knowing where/what to look for, you'll be fine with perl. Never had a situation where I couldn't find a perl module that would give me access to a certain data source (excel, xml, txt, csv, databases, ....). From there it is simply a matter of coding and learning pattern matching. The pattern matching skills are never wasted because all languages have almost similar concepts.
The power of perl for this kind of tasks lays in the fact that perl can be used from very simple scripts till quite complex scripts without you forcing to program in a certain paradigm. It seems cliché, but believe me, this is a real advantage.
Perl is not the perfect choice if you are combining many different data sources on a more continuous basis to construct one record from many records. An ETL tool is the way to go for that (for example Pentaho is a nice solution that provides an free version). The programming effort to access different data sources and combine the records is to big. But that is the same for other languages. However pattern matching is often weak in those tools. If found myself often using an ETL tool and perl in 2 or 3 steps.
Perl is neither very strong in doing fuzzy matching problems aka "Does this looks similar with certain confidence level?". This kind of questions are difficult to answer and perl doesn't have much in the toolbox for this kind of questions.
I eventually want to learn objective C
Sorry, different goal, different answer. If you want to combine learning objective C and datacleansing, objective C is the way to go.
Kind regards
Martell
|
|---|