I've been trying to slowly learn PDL over the last few months. While I'm aware of some available documentation (the PDL::* perldoc, The PDL Book, etc.) I've found the beginner documentation to be lacking. Therefore, I thought it would be a good idea to start 'porting' some numpy documentation to PDL for new users such as myself.
I've started with 100 numpy exercises, and this is the work in progress port to Perl/PDL.
As I'm still learning PDL, some solutions may be less than optimal, while others do not currently have solutions as they are outside my of level of competency. Therefore, I'm posting this WIP to PM to ask for comments and contributions.
As with most of Perl, there is more than one way to do it for most of these. I've decided to keep the $var->function() syntax as much as possible to easily be able to chain operations.
Thanks in advance!
100 PDL Exercises
- 1. Load the PDL library:
-
use PDL;
- 2. Print the PDL version:
-
use PDL::Version;
print $PDL::Version::VERSION;
- 3. Create a null vector of size 10:
-
my $z = zeros(10);
print $z;
- 4. How to find the memory size of any matrix?
-
use PDL::Core ':Internal'
my $z = zeros(10);
my $size = howbig($z->get_datatype) * $z->nelem;
print $size;
- 5. How to get the documentation of the numpy add function from the command line?
-
# To get top level PDL help
perldoc PDL
- 6. Create a null vector of size 10 and set the fifth value to 1:
-
my $z = zeros(10);
$z->slice(4) .= 1;
print $z;
or with PDL::NiceSlice loaded:
my $z = zeros(10);
$z(4) .= 1;
print $z;
Note: It will be assumed that PDL::NiceSlice will be loaded from now on.
- 7. Create a vector with values ranging from 10 to 49:
-
my $z = 10 + sequence(40);
print $z;
- 8. Reverse a vector (first element becomes last)
-
my $z = sequence(10);
$z = $z(-1:0);
print $z;
- 9. Create a 3x3 matrix with values from 0 to 8:
-
my $z = sequence(3,3);
print $z;
- 10. Find indices of non-zero elements in the vector [1,2,0,0,4,0]:
-
my $vec = pdl [1,2,0,0,4,0];
my $nz = $vec->which;
print $nz;
- 11. Create a 3x3 identity matrix:
-
my $z = identity(3);
print $z;
- 12. Create a 3x3x3 matrix of random values:
-
my $z = random(3,3,3);
print $z;
- 13. Create a 10x10 matrix of random values and find the minimum and maximum values:
-
my $z = random(10,10);
print $z->min, $z->max;
- 14. Create a random vector of size 30 and find the mean value:
-
my $z = random(30);
print $z->avg;
- 15. Create a 2D matrix with 1 on the border and 0 inside:
-
my $z = ones(10,10);
$z(1:8,1:8) .= 0;
print $z;
- 16. How to add a border (filled with 0's) around an existing matrix?
-
my $a = random(8,8);
my $x = zeros(2 + $a->dim(0), 2 + $a->dim(1));
$x(1:8,1:8) .= $a;
print $x;
- 17. What is the result of the following expression?
-
n/a
- 18. Create a 5x5 matrix with values 1,2,3,4 just below the diagonal:
-
my $z = identity(5) * (1 + sequence(5));
$z->where($z > 4) .= 0;
$z = $z->transpose->rotate(1)->transpose;
print $z;
- 19. Create a 8x8 matrix and fill it with a checkerboard pattern:
-
my $z = zeros(8,8);
$z("0::2","0::2") .= 1
$z("1::2","1::2") .= 1
print $z;
- 20. Consider a (6,7,8) shape matrix, what is the index (x,y,z) of the 100th element?
-
my $z = random(6,7,8);
my $hundreth = $z->clump($z->ndims)->(100);
# TODO: find index of $hundreth in $z
- 21. Create a checkerboard 8x8 matrix using the tile function:
-
n/a
- 22. Normalize a 5x5 random matrix:
-
my $z = random(5,5);
$z = (($z - $z->min) / ($z->max - $z->min));
print $z;
- 23. Create a custom dtype that describes a color as four unsigned bytes (RGBA):
-
n/a
- 24. Multiply a 5x3 matrix by a 3x2 matrix (real matrix product):
-
my $x = ones(3,5);
my $y = ones(2,3);
print $x x $y;
- 25. Given a 1D matrix, negate all elements which are between 3 and 8, in place.
-
my $z = sequence(10);
$z->where($z <= 8 & $z>= 3) *= -1;
print $z;
- 26. What is the output of the following script?
-
n/a
- 27. Consider an integer vector Z, which of these expressions are legal?
-
my $z = sequence(long, 10);
$z ** $z;
2 << $z >> 2;
$z <- $z;
1j * $z;
$z / 1 / 1;
$z < $z > $z;
- 28. What are the result of the following expressions?
-
print pdl(0) / pdl(0);
print pdl(0) // pdl(0);
print float int pdl(NaN);
- 29. How to round away from zero a float matrix?
-
$z = 20 * random(10) - 10;
$z->where($z < 0) .= - $z->where($z < 0)->abs->ceil;
$z->where($z > 0) .= $z->where($z > 0)->ceil;
print $z;
- 30. How to find common values between two matrix?
-
my $z1 = long 256 * random(10);
my $z2 = long 256 * random(10);
print intersect $z1, $z2;
- 31. How to ignore all numpy warnings (not recommended)?
-
n/a for PDL
- 32. Is the following expressions true?
-
n/a
- 33. How to get the dates of yesterday, today and tomorrow?
-
# No built in PDL time/date functions
my $yesterday = time() - (60 * 60 * 24);
my $today = time();
my $tomorrow = time() + (60 * 60 * 24);
- 34. How to get all the dates corresponding to the month of July 2016?
-
n/a
- 35. How to compute ((A+B)*(-A/2)) in place?
-
my $a = ones(3);
my $b = 2 * ones(3);
my $c = 3 * ones(3);
print ($a + $b) * (- $a/2)
- 36. Extract the integer part of a random matrix using 5 different methods:
-
my $z = 10 * random(10);
print $z->ceil;
print $z->floor;
print byte $z;
print long $z;
print longlong $z;
- 37. Create a 5x5 matrix with row values ranging from 0 to 4:
-
my $z = xvals zeros(5,5);
print $z;
- 38. Consider a generator function that generates 10 integers and use it to build an matrix:
-
n/a
- 39. Create a vector of size 10 with values ranging from 0 to 1, both excluded:
-
my $z = (sequence(12) / 11)->slice("1:10");
print $z;
- 40. Create a random vector of size 10 and sort it:
-
my $z = random(10)->qsort;
print $z;
- 41. How to sum a small matrix faster than np.sum?
-
n/a
- 42. Consider two random matrices A and B, check if they are equal:
-
my $a = random(10);
my $b = random(10);
print $a == $b;
- 43. Make an array immutable (read-only):
-
n/a
- 44. Consider a random 10x2 matrix representing cartesian coordinates, convert them to polar coordinates:
-
use PDL::Complex;
my $z = random(2,10);
my $p = Cr2p($z);
print $p;
- 45. Create random vector of size 10 and replace the maximum value by 0:
-
my $z = random(10);
$z->where($z == $z->max) .= 0;
print $z;
- 46. Create a structured array with x and y coordinates covering the [0,1]x[0,1] area.
-
n/a
- 47. Given two arrays, X and Y, construct the Cauchy matrix C (Cij =1/(xi - yj)):
-
TODO
- 48. Print the minimum and maximum representable value for each data type:
-
# This cannot be done directly, but you can extract the underlying
# C type used for each PDL type:
print byte->realctype;
print short->realctype;
print ushort->realctype;
print long->realctype;
print longlong->realctype;
print indx->realctype;
print float->realctype;
print double->realctype;
- 49. How to print all the values of an array?
-
# Set maximum print limit to one million elements
$PDL::toolongtoprint = 1_000_000;
$z = zeros(1000,1000);
print $z;
- 50. Find the nearest value from a given value in an array:
-
TODO
- 51. Create a structured array representing a position (x,y) and a color (r,g,b):
-
n/a
- 52. Consider a random vector with shape (100,2) representing coordinates, find point by point distances:
-
my $z = random(10,2);
my ($x,$y) = ($z(:,0), $z(:,1));
my $d = (($x - $x->transpose)->ipow(2)) +
(($y - $y->transpose)->ipow(2));
print $d;
- 53. How to convert a float (32 bits) array into an integer (32 bits) in place?
-
my $z = float 1000 * random(10);
$z = long $z;
- 54. Consider the following file:
-
1,2,3,4,5
6,,,7,8
,,9,10,11
How to read it?
my $z = rcols "data.csv", { COLSEP => ',' }, [];
$z = $z->transpose; # optional (PDL is column major)
print $z;
- 55. What is the equivalent of enumerate for numpy arrays?
-
n/a
- 56. Generate a generic 2D Gaussian-like array:
-
my $z = grandom(10,10); # correct?
print $z;
- 57. How to randomly place p elements in a 2D array?
-
my $p = 3;
my $z = zeros(10,10);
my $i = indx $z->nelem * random($p)
$z->clump($z->ndims)->($i) .= 1;
print $z;
- 58. Subtract the mean of each row of a matrix:
-
my $z = random(5, 10);
$z = $z - $z->avgover->transpose;
print $z;
- 59. How to I sort an array by the nth column?
-
TODO
- 60. How to tell if a given 2D array has null columns?
-
TODO
- 61. Find the nearest value from a given value in an array:
-
TODO
- 62. Considering two arrays with shape (1,3) and (3,1), how to compute their sum using an iterator?
-
n/a
- 63. Create an array class that has a name attribute:
-
n/a
- 64. Consider a given vector, how to add 1 to each element indexed by a second vector (be careful with repeated indices)?
-
TODO
- 65. How to accumulate elements of a vector (X) to an array (F) based on an index list (I)?
-
TODO
- 66. Considering a (w,h,3) image of (dtype=ubyte), compute the number of unique colors:
-
my ($w, $h) = (16, 16);
my $i = byte 256 * random($w, $h, 3);
my $uniqcol = $i->uniq->nelem;
print $uniqcol;
- 67. Considering a four dimensions array, how to get sum over the last two axis at once?
-
TODO
- 68. Considering a one-dimensional vector D, how to compute means of subsets of D using a vector S of same size describing subset indices?
-
TODO
- 69. How to get the diagonal of a dot product?
-
my $z1 = random(10, 10);
my $z2 = random(10, 10);
print $z1->inner($z2);
- 70. Consider the vector [1, 2, 3, 4, 5], how to build a new vector with 3 consecutive zeros interleaved between each value ?
-
my $z = pdl [1,2,3,4,5];
my $nz = 3;
my $x = zeros($z->dim(0) * $nz);
$x("0::$nz") .= $z;
print $x;
- 71. Consider an array of dimension (5,5,3), how to multiply it by an array with dimensions (5,5)?
-
my $z1 = ones(5,5,3);
my $z2 = 2 * ones(5,5);
print $z1 * $z2;
- 72. How to swap two rows of an array?
-
my $z = sequence(5,5);
$z(0:1,) .= $z(1:0,)->sever;
print $z;
- 73. Consider a set of 10 triplets describing 10 triangles (with shared vertices), find the set of unique line segments composing all the triangles:
-
TODO
- 74. Given an array C that is a bincount, how to produce an array A such that np.bincount(A) == C?
-
TODO
- 75. How to compute averages using a sliding window over an array?
-
TODO
- 76. Consider a one-dimensional array Z, build a two-dimensional array whose first row is (Z[0],Z[1],Z[2]) and each subsequent row is shifted by 1 (last row should be (Z[-3],Z[-2],Z[-1])
-
TODO
- 77. How to negate a boolean, or to change the sign of a float inplace?
-
my $z = long 2 * random(10);
$z = not $z;
print $z;
$z = -5 + sequence(10);
$z = -1 * $z;
print $z;
- 78. Consider 2 sets of points P0,P1 describing lines (2d) and a point p, how to compute distance from p to each line i (P0[i],P1[i])?
-
TODO
- 79. Consider 2 sets of points P0,P1 describing lines (2d) and a set of points P, how to compute distance from each point j (P[j]) to each line i (P0[i],P1[i])?
-
TODO
- 80. Consider an arbitrary array, write a function that extract a subpart with a fixed shape and centered on a given element (pad with a fill value when necessary):
-
TODO
- 81. Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]:
-
$z = 10 * random(15);
$len = 4;
my @r = ();
push @r, $z($_:$_ + $len-1) for (0 .. $z->nelem - $len)
$r = pdl @r;
print $r;
- 82. Compute a matrix rank:
-
my $z = 10 * random(10,10);
my ($u, $s, $v) = $z->svd;
my $rank = $s->where($s > 1e-10);
print $rank;
- 83. How to find the most frequent value in an array?
-
TODO
- 84. Extract all the contiguous 3x3 blocks from a random 10x10 matrix:
-
my $z = long 5 * random(10,10);
my $dim = 3;
my (@out, $out);
for my $i ( 0 .. $z->dim(0) - $dim - 1) {
for my $j ( 0 .. $z->dim(1) - $dim - 1) {
push @out, $z($i:$i+$dim,$j:$j+$dim);
}
}
$out = pdl @out;
print $out;
- 85. Create a 2D array subclass such that Z[i,j] == Z[j,i]:
-
TODO
- 86. Consider a set of p matrices wich shape (n,n) and a set of p vectors with shape (n,1). How to compute the sum of of the p matrix products at once? (result has shape (n,1))
-
TODO
- 87. Consider a 16x16 array, how to get the block-sum (block size is 4x4)?
-
TODO
- 88. How to implement the Game of Life using PDL arrays?
-
TODO
- 89. How to get the n largest values of an array:
-
my $z = 10 * random(20);
my $n = 3;
print $z->qsort->(-$n:);
- 90. Given an arbitrary number of vectors, build the cartesian product (every combinations of every item)
-
TODO
- 91. How to create a record array from a regular array?
-
n/a?
- 92. Consider a large vector Z, compute Z to the power of 3 using 3 different methods:
-
my $z = random(5e7);
$z ** 3;
$z->ipow(3);
$z->power(3,0);
- 93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B?
-
TODO
- 94. Considering a 10x3 matrix, extract rows with unequal values (e.g. [2,2,3]):
-
TODO
- 95. Convert a vector of ints into a matrix binary representation:
-
my $z = pdl [0,1,2,3,15,16,32,64,128];
my $bits = ($z->transpose & (2 ** xvals(9)));
$bits->where($bits > 0) .= 1;
print $bits;
- 96. Given a two dimensional array, how to extract unique rows?
-
my $z = long 2 * random(3,6);
print $z->uniqvec;
- 97. Considering 2 vectors A & B, write the einsum equivalent of inner, outer, sum, and mul function:
-
TODO
- 98. Considering a path described by two vectors (X,Y), how to sample it using equidistant samples:
-
TODO
- 99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n:
-
TODO
- 100. Compute bootstrapped 95% confidence intervals for the mean of a 1D array X (i.e., resample the elements of an array with replacement N times, compute the mean of each sample, and then compute percentiles over the means):
-
TODO
edit: link to PDL for those who do not know what it is
Re: RFC: 100 PDL Exercises (ported from numpy)
by vr (Curate) on May 03, 2018 at 17:12 UTC
|
I'll pick an easy one of "TODO" first and will add some more later, in no order -- so there'll be a few edits, return for more :-)
99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n:
use strict;
use warnings;
use PDL;
my $x = pdl([1.0, 0.0, 3.0, 8.0],
[2.0, 0.0, 1.0, 1.0],
[1.5, 2.5, 1.0, 0.0]);
my $n = 4;
my $mask = ( $x == $x-> rint )-> andover
&
( $x-> sumover == $n );
print $x-> transpose
-> whereND( $mask )
-> transpose;
--------------------------
Added:
--------------------------
print $PDL::Version::VERSION;
Is it officially recommended way? What's the benefit over usual $PDL::VERSION;? I also wonder what's the $VERSION = eval $VERSION; in that tiny module.
--------------------------
4. How to find the memory size of any matrix?
print $z-> info( '%M' );
--------------------------
35. How to compute ((A+B)*(-A/2)) in place?
There's maybe a typo or two in Python solution (what's the "C"?). But to operate inplace, I'd do this:
my $a = ones(3);
my $b = 2 * ones(3);
$b += $a;
$a /= -2;
$b *= $a;
print $b;
IIRC, combined assignment operators are overloaded to work inplace, but I can't find a reference right now, will do it later.
--------------------------
53. How to convert a float (32 bits) array into an integer (32 bits) in place?
Good question. The ceil and floor convert double to long inplace. Not sure if PDL allows to do so for 32-bit types.
--------------------------
45. Create random vector of size 10 and replace the maximum value by 0:
More efficient:
my $z = random( 10 );
$z( $z-> maximum_ind ) .= 0;
print $z;
--------------------------
64. Consider a given vector, how to add 1 to each element indexed by a second vector (be careful with repeated indices)?
my $z = zeroes( 10 );
my $i = pdl( 1, 3, 5, 3, 1 );
indadd( 1, $i, $z );
print $z;
--------------------------
81. Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]:
my $z = 1 + sequence 14;
my $len = 4;
print $z-> lags( 0, 1, 1 + $z-> nelem - $len )
-> slice( '','-1:0' );
--------------------------
87. Consider a 16x16 array, how to get the block-sum (block size is 4x4)?
If I understand the task correctly, and now that I've learned about lags:
my $x = sequence 16, 16;
print $x-> lags( 1, 4, 4 )
-> slice( '', '', '-1:0' )
-> xchg( 0, 1 )
-> sumover
-> lags( 0, 4, 4 )
-> slice( '', '-1:0' )
-> sumover;
(Sigh...) Utilizing benefits of idle commuting and thinking things over:
my $x = sequence 16, 16;
print $x-> reshape( 4, 4, 4, 4 )
-> reorder( 0, 2, 1, 3 )
-> clump( 2 )
-> sumover
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks for all of these
I assume you wouldn't mind me merging the solutions into my 'master' POD, with credit of course.
Just from reading your solutions I learned a number of new functions, including maximum_ind (and it's sister functions maximum_n_ind, minimum_ind, minimum_n_ind and max2d_ind) and lags. This is what I hoped would happen, as everyone seems to know a different subset of PDL functionality.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hi, mxb, of course I woudn't mind.
+ Note, your recipe #66 doesn't do what's expected, -- but to shift bytes left manually, as Python guys do, isn't nice neither. I'd do this:
pdl> $x = sequence 2,2,3 # "2x2 planar RGB" image, 4 unique colors
pdl> $x = $x-> glue( 0, $x ) # "4x2 RGB" image, 4 unique colors
pdl> $x = $x-> glue( 1, $x ) # "4x4 RGB" image, still 4 unique colors
pdl> $x-> set( 2,2,2, 100 ) # make them 5
pdl> p$x
[
[
[0 1 0 1]
[2 3 2 3]
[0 1 0 1]
[2 3 2 3]
]
[
[4 5 4 5]
[6 7 6 7]
[4 5 4 5]
[6 7 6 7]
]
[
[ 8 9 8 9]
[ 10 11 10 11]
[ 8 9 100 9]
[ 10 11 10 11]
]
]
pdl> p $x-> clump(2)-> transpose-> uniqvec-> getdim( 1 )
5
-----------------
As to combined assignment operators working in-place, here is simple experiment (Windows), either line #1 or #2 un-commented on different runs:
use strict;
use warnings;
use feature 'say';
use PDL;
my $x = zeroes 1e8;
my $y = ones 1e8;
$x = $x + $y; # 1
#$x += $y; # 2
say qx{ typeperf "\\Process(perl)\\Working Set Peak" -sc 1 }
=~ /.+"(.+)"/s;
__END__
>perl pdl180504.pl
2427752448.000000
>perl pdl180504.pl
1627779072.000000
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by bliako (Monsignor) on May 03, 2018 at 14:08 UTC
|
Good idea!
Here is my take on #100 (TODO)
The long version as a standalone program:
The short version:
# get a random sample with replacement from input vector
# 2nd param is a RNG object which will select N random
# elements (with replacement) from input vector
# 3rd param is OPTIONAL: the size of the sample,
# default is the size of the input vector
sub resample {
my $original_sample = $_[0];
my $arng = $_[1];
my $No = $_[2] || $original_sample->getdim(0);
my $indices = ($No * $rng->get_uniform($No))->floor();
my $newsample = $original_sample->slice($indices);
return $newsample;
}
my $M = 1000; # num bootstrap resamplings
my $X = ... ; # input data piddle (1D)
my $R = $X->getdim(0); # size of bootstrap resamples
my $rng = PDL::GSL::RNG->new('taus')->set_seed($seed);
my $means = zeroes($M);
for(my $i=0;$i<$M;$i++){
# get a re-sample from original data with size R
# use our RNG to do the re-sampling:
my $asample = resample($X, $rng, $R);
$means->set($i, PDL::Ufunc::avg($asample));
}
# now sort the means vector and pick the elements at
# the confidence intervals specified, e.g. 5%
my $sorted_means = PDL::Ufunc::qsort($means);
my $confidence_intervals_values = [
$sorted_means->at(int(0.05*$M)),
$sorted_means->at(int(0.95*$M))
];
EDIT 1:
The above script can easily be parallelised. Input data is readonly (X). Each worker writes results (mean, stdev) to the same piddle but at different array locations guaranteed. So, there is no need for locking afaics.
Unfortunately I can not get it to parallelise with threads because PDL::GSL::RNG seems not to like threads. The RNG is needed in order to get a random sample from the original data on every bootstrap iteration. In fact each worker/thread can have its own RNG, not a copy but a different RNG local to each thread. However, even like this I get the dreaded pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug
Any ideas?
EDIT 2: the parallelised version as a standalone program:
corrections welcome (especially on good practices about how to parallelise) | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Hi, bliako, thank you so much for detailed answer, my statistics skills were (hopefully) auto-vivified :). After following your links and code in earnest, I felt brave enough to make some experiments and write a comment, but in the process I discovered something strange ;).
----------
First, my impression is that solutions to exercices were supposed to be simple (as, KISS). So, perhaps to translate, almost verbatim, Python solution to PDL, answer to #100 can be:
use strict;
use warnings;
use feature 'say';
use PDL;
my $n = 100; # input sample size
my $m = 1000; # number of bootstrap repeats
my $r = $n; # re-sample size
my $x = random $n;
my $idx = random $r, $m;
$idx *= $n;
say $x-> index( $idx )
-> avgover
-> pctover( pdl 0.05, 0.95 );
__END__
[ 0.4608755 0.55562806]
Interesting, here, PDL DWIMs for me -- no need to floor an index to thread over a piddle (just as with Perl's array indices). I also stand corrected in "floor converts to Long in-place" -- it rounds in-place, but piddle stays Double.
This 'never to explicitly loop in vectorized language' answer, unfortunately, hides the ugly truth that for very large data we can end with huge R x M matrices of random indices and equally huge (equally unnecessary) matrices of all re-samplings, and thus die because of 'Out of memory!'.
I was experimenting with this or that (PDL's automatic parallelization, in particular), which I'm skipping now, because next is something weird.
Consider this version of the above, which avoids 2-dimensional index matrix and results of re-samplings, but is still un-parallel:
use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';
use PDL;
srand( 123 );
my $time = time;
my $n = 30000; # input sample size
my $m = 10000; # number of bootstrap repeats
my $r = $n; # re-sample size
my $x = random $n;
my $avg = zeroes $m;
for ( 0 .. $m - 1 ) {
my $idx = random $r;
$idx *= $n;
$avg-> set( $_, $x-> index( $idx )-> avg )
}
say $avg-> pctover( pdl 0.05, 0.95 );
say time - $time;
__END__
[0.49384165 0.49941814]
6.11959099769592
Next is solution where I'm starting to try to parallelize, but because of selected parameters (single thread) I'm not only expecting no gain, but due to overhead it must be slower. And yet:
use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';
use PDL;
use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /;
srand( 123 );
my $time = time;
my $n = 30000; # input sample size
my $m = 10000; # number of bootstrap repeats
my $r = $n; # re-sample size
my $x = random $n;
my $avg = zeroes $m;
share_pdls x => $x, avg => $avg;
threads-> create( sub {
my ( $x, $avg ) = retrieve_pdls qw/ x avg /;
for ( 0 .. $m - 1 ) {
my $idx = random $r;
$idx *= $n;
$avg-> set( $_, $x-> index( $idx )-> avg )
}
});
$_-> join for threads-> list;
say $avg-> pctover( pdl 0.05, 0.95 );
say time - $time;
__END__
[0.49384165 0.49941814]
4.57857203483582
Why is that? :) I tried to insert
use PDL::Parallel::threads qw/ share_pdls retrieve_pdls /;
share_pdls x => $x, avg => $avg;
( $x, $avg ) = retrieve_pdls qw/ x avg /;
into no-threads solution (does retrieve_pdls set any flags that speed things up? Nope.)
$ perl -v
This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-li
+nux-thread-multi
(with 1 registered patch, see perl -V for more detail)
$ perl -MPDL -E 'say $PDL::VERSION'
2.019
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
vr your code is superior than the long code I have posted!
If I may add: using oddpctover() might be preferred because it does not interpolate when there is no data at the exact percentile position.
Regarding the time difference when running with and without "use threads", I have discovered that avg() is the culprit. If you use x-> index( $idx )->at(0) rather than x-> index( $idx )->avg the performance is the same (which means idx() is also excluded as possible cause).
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
# https://www.perlmonks.org/?node_id=1214227
use strict;
use warnings;
use feature 'say';
use PDL;
use PDL::Parallel::threads qw(retrieve_pdls);
use threads;
use MCE::Shared;
use Time::HiRes 'time';
srand( 123 );
my $time = time;
my $n = 30000; # input sample size
my $m = 10000; # number of bootstrap repeats
my $r = $n; # re-sample size
my $x = random( $n ); $x->share_as('x');
my $avg = zeroes( $m ); $avg->share_as('avg');
my $seq = MCE::Shared->sequence( 0, $m - 1 );
sub parallel_task {
srand;
my ( $x, $avg ) = retrieve_pdls('x', 'avg');
while ( defined ( my $seq_n = $seq->next() ) ) {
my $idx = random $r;
$idx *= $n;
$avg->set( $seq_n, $x->index( $idx )->avg );
}
}
threads->create( \¶llel_task ) for 1 .. 4;
# ... do other stuff ...
$_->join() for threads->list();
say $avg->pctover( pdl 0.05, 0.95 );
say time - $time, ' seconds';
__END__
# Output
[0.49395242 0.49936752]
1.28744792938232 seconds
Afterwards, re-validated PDL with MCE and released 1.847. The effort is mainly for folks running Perl lacking threads support. Here it is, PDL and MCE::Shared running similarly.
# https://www.perlmonks.org/?node_id=1214227
use strict;
use warnings;
use feature 'say';
use PDL; # must load PDL before MCE::Shared
use MCE::Hobo;
use MCE::Shared 1.847;
use Time::HiRes 'time';
srand( 123 );
my $time = time;
my $n = 30000; # input sample size
my $m = 10000; # number of bootstrap repeats
my $r = $n; # re-sample size
# On Windows, the non-shared piddle ($x) is unblessed in threads.
# Therefore, constructing the piddle inside the worker.
# UNIX platforms benefit from copy-on-write. Thus, one copy.
my $x = ( $^O eq 'MSWin32' ) ? undef : random( $n );
my $avg = MCE::Shared->pdl_zeroes( $m );
my $seq = MCE::Shared->sequence( 0, $m - 1 );
sub parallel_task {
$x = random( $n ) unless ( defined $x );
while ( defined ( my $seq_n = $seq->next() ) ) {
my $idx = random $r;
$idx *= $n;
# $avg is a shared piddle which resides inside the shared-
# manager process or thread. The piddle is accessible via the
# OO interface only.
$avg->set( $seq_n, $x->index( $idx )->avg );
}
}
MCE::Hobo->create( \¶llel_task ) for 1 .. 4;
# ... do other stuff ...
MCE::Hobo->wait_all();
# MCE sets the seed of the base generator uniquely between workers.
# Unfortunately, it requires running with one worker for predictable
# results (i.e. no guarantee in the order which worker computes the
# next input chunk).
say $avg->pctover( pdl 0.05, 0.95 );
say time - $time, ' seconds';
__END__
# Output
[0.49387191 0.49937053]
1.29038286209106 seconds
Regards, Mario
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
This 'never to explicitly loop in vectorized language' answer, unfortunately, hides the ugly truth that for very large data we can end with huge R x M matrices of random indices and equally huge (equally unnecessary) matrices of all re-samplings, and thus die because of 'Out of memory!'.
This can be a real issue! Keeping such index-sets around for exactly the right amount of time will help, which in turn might be assisted by making little functions that do individual operations on subsets (which a captured index-set ndarray which goes out of scope on finishing). Another thing that will help is the forthcoming loop fusion, discussed at https://github.com/PDLPorters/pdl/issues/349: non-slice operations will become lazy, and on evaluation will potentially get put together into a new, loop-fused operation.
Something else that would help here, as also discussed on #349, is more generalised first-class "index operations". Ideas and contributions here, on the GitHub issue, on the PDL mailing lists, or any other means are most welcome!
One other thought is that, for larger ndarrays (because POSIX threads have a startup cost), the use of vectorised operations is the way to harness multiple cores for free (for operations that support this), which a Perl for-loop cannot achieve.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by VinsWorldcom (Prior) on May 03, 2018 at 11:34 UTC
|
++BRILLIANT!
I've been using R for statistics and AI/ML just because it was used in the class I took. I've seen PDL, but was unable to grok the documentation as a beginner - same boat as you it seems. I just saw your meditation so haven't gone through it yet, but I always install the Strawberry Perl with PDL so I have it and I'll take a look when I have some free time.
Thanks again!
| [reply] [Watch: Dir/Any] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by choroba (Cardinal) on May 03, 2018 at 13:22 UTC
|
Very nice idea! ++
Wouldn't GitHub be a more suitable place where people can create pull requests with fixed exercises?
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [Watch: Dir/Any] [d/l] |
|
Thanks!
I originally posted it to PM as it's one of the few active Perl forums, so I hoped there would be people with PDL experience around.
I wasn't too sure of the reaction or whether people would be interested in helping. Seems like I have my answer now! :P
I have no objection to putting the completed documents on Github. They are in POD format, so could easily add a makefile to generate html/pdf/etc.
Thanks to everyone who is contributing solutions.
| [reply] [Watch: Dir/Any] |
|
shhhhh, PM needs the traffic ;-)
| [reply] [Watch: Dir/Any] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by pryrt (Abbot) on May 03, 2018 at 13:04 UTC
|
That's great! I have made a few attempts to learn PDL, and while I've been able to do a few things, I think going through this exercise will help me solidify some ideas.
Once the TODOs get fleshed out, I think this should be added to the Tutorials. I would suggest wrapping the code/answers in <spoilers> tags for the final version of the tutorial. Also, for (at least some of) the "n/a" or numpy-specific questions, it may be possible to convert them to perl equivalents.
| [reply] [Watch: Dir/Any] [d/l] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by Laurent_R (Canon) on May 03, 2018 at 14:38 UTC
|
| [reply] [Watch: Dir/Any] |
47. Construct the Cauchy matrix
by Ea (Chaplain) on May 18, 2018 at 15:32 UTC
|
PM is a good choice for this project. It's got the discussion of the best solution(s) to the question. Hopefully someone jumps in an offers a better solution than mine. It feels a little ugly explicitly stating the indices.
Given two arrays, X and Y, construct the Cauchy matrix C (Cij =1/(xi - yj))
First, what's a Cauchy matrix? Ahh, this question is just how to construct a matrix from 2 arrays, where no elements from one array are in the other. Just create a sequence of number for the first array and then make the second array 0.5 more than the first array to get the inputs. Here's a brute force method.
use PDL;
use PDL::NiceSlice;
my $x = sequence(8);
my $y = $x + 0.5;
my ($nx, $ny) = (nelem($x), nelem($y));
my $C = zeroes($nx, $ny);
for (my $i = 0; $i < $nx; $i++) {
for (my $j = 0; $j < $ny; $j++) {
$C($i,$j) .= 1/($x($i) - $y($j));
}
}
print $C;
I like the PDL::NiceSlice for indexing. It makes sense to me. I could have also created the matrix with
my $C = outer($x, $y); or gotten the size of the arrays with $x->getdim(0) and if I grokked
threading rather than just skimming PDL::Threading, this might look way cooler.
NB the ".=" in the assignment breaks the link between the matrix and the 2 arrays. It's important.
That was just the Cauchy matrix. Some people want the Cauchy determinant (as long as the 2 arrays are the same size). Easy! Just import PDL::MatrixOps
use PDL::MatrixOps;
print det $C;
Sometimes I can think of 6 impossible LDAP attributes before breakfast.
YAPC::Europe::2018 — Hmmm, need to talk to work about sending me.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
This is Much Better
When I was young, I coded in for loops, but when I read the documentation, I put away these childish things and learned to Thread Broadcast
$x = sequence(8);
$y = sequence(7) + 0.5;
$c = 1/($x->dummy(0,$y->nelem)->transpose - $y->dummy(0,$x->nelem));
TaDAAA!!
You create 2 vectors (of different sizes), inflate it along the other dimension using dummy to fit the other vector's size (using nelem), flip one of them using transpose
and do the calculation in one line. PDL takes care of the loops and does it faster than you can in Perl.
I had to transpose $x to get the same result as the for loops above, in order to prove they are the same by
pdl> p $C - $c
Ea
Sometimes I can think of 6 impossible LDAP attributes before breakfast.
Mojoconf was great!
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Excellent use of broadcasting (the feature formerly and confusingly known as "threading")! To make that broadcast even better, you'd dummy not by the nelem, but explicitly with dim(1) (or 0) as someone else on here did.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by thechartist (Monk) on Mar 10, 2019 at 01:52 UTC
|
I was working on something very similar (different question set, but same idea). I am working from 101 NumPy Exercises for Data Analysis (Python).
The only significant difference is I am trying to do them in one-liner format; Python's semantically meaningful whitespace makes doing
command line scripting painful. Very quick data analyses can be done at the terminal in Perl, if you know what PDL modules to use.
I posted a few examples on my public scratchpad. I don't have whole lot written, but some guidance and feedback would be helpful. PDL is
a lot of fun to use!
| [reply] [Watch: Dir/Any] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by jtym (Novice) on Oct 30, 2018 at 13:10 UTC
|
93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B? (★★★)
use strict;
use warnings;
use PDL;
my $A = floor(random(3,8) * 6);
my $B = floor(random(2,2) * 6);
my $C = $A->in($B->slice(',0'))->sumover * $A->in($B->slice(',1'))->su
+mover ;
my $rows = which($C > 0);
print $rows;
| [reply] [Watch: Dir/Any] [d/l] |
Re: RFC: 100 PDL Exercises (ported from numpy)
by Anonymous Monk on May 04, 2018 at 13:27 UTC
|
GitHub the thing, pump it up on blogs.perl.org and the perl reddit channel and watch the PRs come in for the TODOs. Seriously. | [reply] [Watch: Dir/Any] |
|
Duh. I found it on reddit. :-( Need more coffee.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
|
|
Re: RFC: 100 PDL Exercises (ported from numpy)
by jtym (Novice) on Oct 29, 2018 at 13:38 UTC
|
I'm thinking it might be an idea to just put the remaining ones we can't do on stack-overflow. | [reply] [Watch: Dir/Any] |
|
|