jmricher70 has asked for the wisdom of the Perl Monks concerning the following question:

Hi perl monks, I am new to Perl but I could write a program that computes a mandelbrot set. I wanted to make it more efficient by using one of the three technologies XS, SWIG or Inline::C.

So I compiled in each case using or indicating the '-fopenmp' flags and '-lgomp' library but for the XS and inline::C methods it doesn't work, I mean even if I set the OMP_NUM_THREADS variable to a value greater than one, I always have one thread running

For SWIG I could see two or 4 threads running.

Here is for example the code for XS in the file mandelbrot.c and mandelbrot.h that I put in the Mandelbrot directory

#include <stdio.h> #include <stdlib.h> #include <omp.h> int escapes(double cr, double ci, int it) { double zr = 0; double zi = 0; double zrtmp; int i; for(i=0; i<it; i++) { // z <- z^2 + c zrtmp = zr*zr - zi*zi + cr; zi = 2*zr*zi + ci; zr = zrtmp; if (zr*zr + zi*zi > 4) { return 1; } } return 0; } void mandel(double xmin, double xmax, int xstep, double ymin, double y +max, int ystep, int iters) { int yc; // array of string to store result char *m = (char *) malloc(ystep * (xstep + 1) * sizeof(char)); #pragma omp parallel for for(yc=0; yc<ystep; yc++) { double y = yc*(ymax-ymin)/ystep + ymin; int xc; for(xc=0; xc<xstep; xc++) { double x = xc*(xmax-xmin)/xstep + xmin; escapes(x, y, iters); if (escapes(x, y, iters)) { m[yc * (xstep + 1) + xc] = ' '; } else { m[yc * (xstep + 1) + xc] = 'X'; } } // add end of string m[yc * (xstep+1) + xstep] = '\0'; } for(yc=0; yc<ystep; yc++) { printf("%s\n", &m[yc * (xstep+1)]); } free(m); }
// mandelbrot.h void mandel(double xmin, double xmax, int xstep, double ymin, double y +max, int ystep, int iters);

And here the mandelbrot_xs.pl that uses the Mandelbrot module :

use warnings; use Mandelbrot; Mandelbrot::mandel(-2.0, 1.0, 256, -1.0, 1.0, 256, 100000);

I followed the following steps:

  1. h2xs -n Mandelbrot -O -x -F '-I..' Mandelbrot/mandelbrot.h
  2. cd Mandelbrot
  3. I modified Makefile.PL with
    LIBS => ['-lgomp'], # e.g., '-lm' CCFLAGS => '-fopenmp',
  4. perl Makefile.PL
  5. make
  6. make install
  7. cd ..
  8. perl mandelbrot_xs.pl

At the end it compiles and runs but always with ONE thread although I am using :

export OMP_NUM_THREADS=2

Replies are listed 'Best First'.
Re: Why openMP is not taken into account by XS and inline::C
by marioroy (Prior) on Aug 19, 2015 at 07:37 UTC

    Update: It is -fopenmp that isn't mentioned in any files under the _Inline/build directory, not -lgomp.

    Inline::C is ignoring -fopenmp, which causes warnings behind the scene stating the pragma omp is unknown. This can be seen by specifying BUILD_NOISY and CLEAN_AFTER_BUILD. Afterwards, I look inside the _Inline/build directory and was unable to find -fopenmp mentioned anywhere inside the log files. The -lgomp was there though.

    use Inline 'C' => Config => BUILD_NOISY => 1; use Inline 'C' => Config => CCFLAGSEX => '-O3 -fopenmp'; use Inline 'C' => config => LIBS => '-lgomp'; use Inline 'C' => <<'END_C', CLEAN_AFTER_BUILD => 0; // C code END_C

    Parallel is possible nonetheless and saw many cores computing simultaneously. The serial code takes 12.4 seconds with parallel completing in 3 seconds on a CentOS 7 VM configured with 4 real cores.

    #!/usr/bin/env perl use strict; use warnings; use Inline 'C' => Config => CCFLAGSEX => '-O3'; use Inline 'C' => <<'END_C'; #include <stdio.h> #include <stdlib.h> int escapes( double cr, double ci, int it ) { double zr = 0; double zi = 0; double zrtmp; int i; for(i=0; i<it; i++) { // z <- z^2 + c zrtmp = zr*zr - zi*zi + cr; zi = 2*zr*zi + ci; zr = zrtmp; if (zr*zr + zi*zi > 4) { return 1; } } return 0; } SV* mandel( int yc_beg, int yc_end, double xmin, double xmax, int xstep, double ymin, double ymax, int ystep, int iters ) { int yc, len; SV *buf = newSVpvn("", 0); // array of string to store result char *m = (char *) malloc(ystep * (xstep + 1) * sizeof(char)); for(yc = yc_beg; yc <= yc_end; yc++) { double y = yc*(ymax-ymin)/ystep + ymin; int xc; for(xc=0; xc<xstep; xc++) { double x = xc*(xmax-xmin)/xstep + xmin; escapes(x, y, iters); if (escapes(x, y, iters)) { m[yc * (xstep + 1) + xc] = ' '; } else { m[yc * (xstep + 1) + xc] = 'X'; } } // add end of string m[yc * (xstep+1) + xstep] = '\0'; } for(yc=yc_beg; yc<=yc_end; yc++) { sv_catpv(buf, (char *) &m[yc * (xstep+1)]); sv_catpv(buf, (char *) "\n"); } free(m); return sv_2mortal(buf); } END_C use MCE::Flow; use MCE::Candy; MCE::Flow::init( bounds_only => 1, max_workers => $ENV{'MCE_NUM_THREADS'} || 'auto', gather => MCE::Candy::out_iter_fh(\*STDOUT) ); mce_flow_s sub { my ($mce, $sequence_ref, $chunk_id) = @_; my ($yc_beg, $yc_end) = @$sequence_ref; my $buf = mandel($yc_beg, $yc_end, -2.0, 1.0, 256, -1.0, 1.0, 256, 1 +00000); MCE->gather($chunk_id, $buf); }, 0, 255;

    Output order is required and handled by MCE::Candy::out_iter_fh. The output files are identical between C + OpenMP and Inline::C + MCE.

    Sincerely, Mario

Re: Why openMP is not taken into account by XS and inline::C
by Anonymous Monk on Aug 19, 2015 at 03:54 UTC
    Where is the XS?
Re: Why openMP is not taken into account by XS and inline::C
by anonymized user 468275 (Curate) on Aug 25, 2015 at 09:53 UTC
    I wonder a bit about the 10000 iterations, which, whereas this will certainly be accurate, it makes me wonder how you plan to output the results. The problems with too much accuracy are that it can't be rendered in an animation and there is of course no such thing as absolute accuracy for the Mandelbrot set. Some reasons to consider having less iterations (say 200 instead of 10000) include:

    1) for a zoom sequence you will enhance the output to an arbitrarily greater accuracy anyway as you go along, (update: store the last term and iteration count for each seed and increase iteration count by say 8 iterations per frame for a 25 fps zoom, so that after the first frame, only 8 extra iterations are needed per member pixel; garbage collect zoomed-out seeds)

    2) there is no point in having a greater accuracy than you can either render or see (as accuracy increases, the detail will tend to disappear when rendered, although too low iterations, e.g. only 50 would produce blotchiness) and raw numeric output has limited usefulness given the limitations of digital processing versus infinite detail

    3) you want to stop iterating as soon as you practically can for performance reasons.

    One world, one people