Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

thechartist's scratchpad

by thechartist (Monk)
on Dec 29, 2017 at 04:57 UTC ( [id://1206392]=scratchpad: print w/replies, xml ) Need Help??

101 Perl PDL Exercises for Data Analysis (March 2019 with PDL 2.019)

Before "data science" became a fashionable topic in computing, Perl hackers have been cleaning and analyzing data since Perl was written. The following tutorial provides Perl examples to the problems posed in: 101 NumPy Exercises for Data Analysis

My purpose is to demonstrate that Perl not only has the necessary tools to complete common data analysis tasks, but you are likely to get better performance out of Perl, with minor effort.

The philosophy of Perl has always been "There is more than one way to do it." Data analysis is no exception. While PDL has excellent functionality "out of the box", you might find it more effective to use individual CPAN modules to solve particular problems.

One other project to keep an eye on is Rperl -- a restricted subset of Perl that compiles to C++. It promises to give the best of both worlds: rapid prototyping to minimize developer time, with efficient code generation to minimize computational resources. At the time of this writing, Rperl appears to only work on Ubuntu Linux.

One current area of weakness -- the world of Machine Learning. This isn't so much due to any flaws in Perl, but accidents of history. There are bindings to some modern ML libraries (mxnet), but extensive experience using them is currently lacking in the Perl community. But hopefully this tutorial will start to change that.

This document assumes you know some basic programming -- loops, conditionals, variables, etc. Perl syntax is similar to any C derived language. Perl has a few fundamental data types:

  • 1. Scalar: single items, such as strings of characters, and sequences of numbers -- Prefixed by '$'
  • 2. Lists/Arrays: a collection of items in order -- Prefixed by '@' Arrays are variables. The values of an array are lists.
  • 3. Hashes: a collection of key/value pairs. -- Prefixed by '%'. Hashes can be converted to lists, and vice versa.
  • Examples:

    $foo = 99; # Assigns the integer 99 to $foo. Scalar context. @Foo = ('Jack', 5, 'Jill', 4, 'John', 7); # A list assigned to @foo, v +alues separated by commas. %foo = ('Jack', 5, 'Jill', 4, 'John', 7); # A list as key, value pairs +. Better ways to write this exist.

    There are others (typeglobs and references), but they will not be needed for the exercises that follow.

    As always in Perl, there is more than one way to do anything. For PDL, one can enter simply invoke the Perl interpreter at the command line (like any othe Perl script), or use a REPL (Read, Evaluate, Print, Loop) interface for interactive analysis. This exercise will show the Perl PDL one liner entered at the command shell, but code in between quotation marks should work at the REPL also.

    Exercise 1 1. Import PDL and print the version.

    Answer:

    $ perl -MPDL -e "print PDL::VERSION;"

    2. Create a 1D array of numbers from 0 to 9

    Answer:

    $ perl -MPDL -e "$arr = sequence(10); print $arr;"

    3. Q. Create a 3×3 numpy array of all True’s

    Answer:

    $ perl -MPDL -e "$arr = ones(3,3), print $arr;"

    4. Q. Q. Extract all odd numbers from arr = [0,1,2,3,4,5,6,7,8,9].

    Answer:

    $ perl -MPDL -e "$arr = sequence(10); $odd = where($arr, ($arr%2) == 1 +); print $odd;"

    5. Q. Replace all odd numbers in arr (from question 4) with -1.

    Input: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Output: [ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1]

    Answer:  perl -MPDL -e "$arr = sequence(10); $odd = $arr->where($arr % 2 == 1); $odd .= -1 ; print $arr;"

    6.Replace all odd numbers in arr with -1 without changing arr

    Input: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [ 0, -1, 2, -1, 4, -1, 6, -1 +, 8, -1]

    Answer: $ perl -MPDL -e "$arr = sequence(10); $out = sequence(10); $odd = $arr->where($arr %2 == 1); $odd .= -1; print $out, $arr;" Note: the '.=' operator is a special type of assignment operator in the PDL context. Ordinarily this is used for string concatenation.

    7. Q. Convert a 1D array to a 2D array with 2 rows.

    Input: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Output: [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9] ]

    Answer:

    $ perl -MPDL -e "$seq = sequence(10); $seq_1 = $seq->reshape(5,2); pri +nt $seq_1;"

    8. Q. Stack arrays a and b vertically

    Input: a = [0,1,2,3,4,5,6,7,8,9] b = [1,1,1,1,1,1,1,1,1,1] Output: [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]

    Answer:

    $ perl -MPDL -e "$arr_a = sequence(10), $arr_b = ones(10); $out = pdl( + $arr_a, $arr_b )->reshape( 5,4 ) ; print $out; "

    9. Q. Stack the arrays a and b horizontally.

    Output: [[0, 1, 2, 3, 4, 1, 1, 1, 1, 1], [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]]

    Answer:

    perl -MPDL -e "$arr_a = sequence(10)->reshape(5,2); $arr_b = ones(10)- +>reshape(5,2); print append( $arr_a, $arr_b );"

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others exploiting the Monastery: (5)
    As of 2024-04-16 18:39 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found