Re: Perl PDL slower than python numpy (Updated2)

in reply to Perl PDL slower than python numpy

Edit2. Here's PDL vs. numpy simple performance test in earnest, I hope Python code really does what I intended, and so no more blunders. I don't see any "numpy is better than PDL" as claimed, except PDL devs didn't bother to optimize "zeroes", as, it looks, it really fills every 8 bytes, cell by cell by cell... Too bad if initializing lots of arrays with zeroes is mission-critical. (That said, of course PDL isn't perfect.) Python3 and numpy are from 18.04LTS repositories.

import time
import numpy as np
d = (1000,500,500)
t = time.time(); x = np.zeros(( d )); print( time.time() - t )
t = time.time(); y = np.ones(( d ));  print( time.time() - t )
t = time.time(); z = x / y;           print( time.time() - t )

1.1920928955078125e-05
0.6920394897460938
1.205686330795288


use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';
use PDL;
$PDL::BIGPDL = $PDL::BIGPDL = 1;
my @d = (1000,500,500);
my $t;
$t = time; my $x = zeroes( @d ); say time - $t;
$t = time; my $y = ones( @d );   say time - $t;
$t = time; my $z = $x / $y;      say time - $t;

0.727283954620361
0.730240821838379
0.971168994903564
[download]

-------

Edit. I was wrong about 'float32' being default for numpy, sorry. My answer doesn't explain the observed speed comparison.

-------

As quick googling shows, default data type for numpy is 32-bit "single precision" "float". PDL default is 64-bit "double precision" "double". Hence illusory 2x speed difference. Your example is just trivial allocation/arithmetic in underlying C, after all. In PDL, you can specify data type in constructor, as e.g. typing "?zeroes" in interactive shell will explain (and as shown below).

Complete cmd.exe window dump to show versions, etc.:

----------------------------------------------
 Welcome to Strawberry Perl PDL Edition!
 * URL - http://strawberryperl.com + http://pdl.perl.org
 * to launch perl script run:      perl c:\my\scripts\pdl-test.pl
 * to start PDL console run:       pdl2
 * to update PDL run:              cpanm PDL
 * to install extra module run:    cpanm PDL::Any::Module
          or if previous fails:    ppm PDL::Any::Module
 * or you can use dev tools like:  gcc, g++, gfortran, gmake
 * see README.TXT for more info
----------------------------------------------
Perl executable: C:\berrybrew\strawberry-perl-5.30.1.1-64bit-PDL\perl\
+bin\perl.exe
Perl version   : 5.30.1 / MSWin32-x64-multi-thread
PDL version    : 2.019

C:\berrybrew\strawberry-perl-5.30.1.1-64bit-PDL>pdl2
Unable to get Terminal Size. The Win32 GetConsoleScreenBufferInfo call
+ didn't work. The COLUMNS and LINES environment variables didn't work
+. at C:/berrybrew/strawberry-perl-5.30.1.1-64bit-PDL/perl/vendor/lib/
+Term/ReadLine/readline.pm line 410.
load_rcfile: got $HOME = C:\berrybrew\strawberry-perl-5.30.1.1-64bit-P
+DL\data
load_rcfile: loading PDL/default.pdl
Perldl2 Shell v0.008
      PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
      'COPYING' in the PDL distribution. This is free software and you
      are welcome to redistribute it under certain conditions, see
      the same file for details.
Loaded plugins:
  CleanErrors
  Commands
  Completion
  CompletionDriver::INC
  CompletionDriver::Keywords
  CompletionDriver::LexEnv
  CompletionDriver::Methods
  DDS
  FindVariable
  History
  LexEnv
  MultiLine::PPI
  NiceSlice
  PDLCommands
  Packages
  PrintControl
  ReadLineHistory
Type 'help' for online help
Type Ctrl-D or quit to exit
Loaded PDL v2.019
pdl>
pdl> $PDL::BIGPDL = 1
pdl> use Time::HiRes 'time'; *t = \&time
pdl> @d=(500,500,500)    # 8 GB RAM here, let's avoid swapping
pdl>
pdl> $t=t; $x=zeroes@d; $y=ones@d; $z=$x+$y; $z2=$x/$y; p t-$t
1.2526330947876
pdl> $t=t; $x=zeroes@d; $y=ones@d; $z=$x+$y; $z2=$x/$y; p t-$t
1.69825196266174
pdl> $t=t; $x=zeroes@d; $y=ones@d; $z=$x+$y; $z2=$x/$y; p t-$t
1.59618711471558
pdl> p $z2->info
PDL: Double D [500,500,500]
pdl>
pdl> $t=t; $x=zeroes double,@d; $y=ones double,@d; $z=$x+$y; $z2=$x/$y
+; p t-$t
1.64064288139343
pdl> $t=t; $x=zeroes double,@d; $y=ones double,@d; $z=$x+$y; $z2=$x/$y
+; p t-$t
1.6656858921051
pdl> $t=t; $x=zeroes double,@d; $y=ones double,@d; $z=$x+$y; $z2=$x/$y
+; p t-$t
1.68068408966064
pdl>
pdl> $t=t; $x=zeroes float,@d; $y=ones float,@d; $z=$x+$y; $z2=$x/$y; 
+p t-$t
1.11372804641724
pdl> $t=t; $x=zeroes float,@d; $y=ones float,@d; $z=$x+$y; $z2=$x/$y; 
+p t-$t
0.841649055480957
pdl> $t=t; $x=zeroes float,@d; $y=ones float,@d; $z=$x+$y; $z2=$x/$y; 
+p t-$t
0.83014702796936
pdl> p $z2->info
PDL: Float D [500,500,500]
pdl>
[download]

Comment on Re: Perl PDL slower than python numpy (Updated2) Select or Download Code

Replies are listed 'Best First'.
Re^2: Perl PDL slower than python numpy by sgt (Deacon) on Sep 22, 2020 at 15:09 UTC
As quick googling shows, default data type for numpy is 32-bit "single precision" "float" Are you really sure? I was surprised by your claim that numpy use a default C float. I am a longtime Perl and C hacker and no python expert. But I view the C float type as kind of archaic I use many math libs and the trend is to go past the C double. It would be surprising for a modern lib like numpy to use such a default. Also float() in python means floating point not C float. Note that I do not really care if tool X is faster that tool Y when the sun, jupiter and the moon are aligned. But I did a bit of web search and could not _quickly_ come up with a definite answer! so I decided to check. Seems that numpy default is a C double: % steph@kerangi (/tmp/cpanm_t.d) % % python3.8 -c 'import math as m; print(m.sin(float(1)))' 0.8414709848078965 % steph@kerangi (/tmp/cpanm_t.d) % % python3.7 Python 3.7.7 (default, Apr 10 2020, 07:59:19) [GCC 9.3.0] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.float <class 'float'> >>> np.float64 <class 'numpy.float64'> >>> np.float32 <class 'numpy.float32'> >>> print(np.sin(float(1))) 0.8414709848078965 >>> print(np.sin(np.float64(1))) 0.8414709848078965 >>> print(np.sin(np.float32(1))) 0.84147096 >>> [download] I think that PDL is really a fantastic piece of software and that it is pretty fast. One possible pitfall, common to all C extensions to Perl, is to go back and forth too many times between Perl and C as that can make a computation much slower. It is often possible to avoid it. hth cheers --sgt	[reply] [d/l]
Re^3: Perl PDL slower than python numpy by etj (Deacon) on Apr 19, 2022 at 23:03 UTC
A single-precision `float` still has utility, including in machine-learning (where higher precision isn't useful, whereas speed is), and graphics (same consideration).	[reply] [d/l]
Re^2: Perl PDL slower than python numpy by syphilis (Archbishop) on Sep 22, 2020 at 14:36 UTC
My answer doesn't explain the observed speed comparison. Might it simply be that numpy is optimized to recognize that $X + $Y is $Y, and $X / $Y is $X ? (Whereas PDL goes to the trouble of doing the arithmetic.) Cheers, Rob	[reply]
Re^2: Perl PDL slower than python numpy (Updated2) by etj (Deacon) on Apr 19, 2022 at 22:55 UTC
2022 updates to this excellent note: with 2.058, `zeroes` was optimised so that the ndarray is initialised using `memset` (see https://github.com/PDLPorters/pdl/issues/274 for discussion and measurements). with 2.077, the PDL shells have a `with_time { code... }` function to make this sort of measurement easier. Observation: the first command of the above snippet is slightly quicker than the rest; this may be because the following ones, in assigning to e.g. `$y`, will trigger the destruction of the previous contents of that variable, which is likely to take some time.	[reply] [d/l] [select]
Re^2: Perl PDL slower than python numpy by fanasy (Sexton) on Sep 22, 2020 at 11:49 UTC
I have checked the document of numpy , the default type is float64	[reply]
Re^2: Perl PDL slower than python numpy by fanasy (Sexton) on Sep 22, 2020 at 11:50 UTC
However, the thread should be close. I have compared as again, the numpy is better than PDL on large size of vectors. thanks! fan	[reply]

In Section Seekers of Perl Wisdom