georgi has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I'm hoping that somebody out there can help me with what's been a real bugbear for the last few days:

BACKGROUND:

I need to write a subroutine that (among other things) turns thousands of TIF files into dozens of PDF files on a 4x intel multi-processor machine running MS2003 server. I'm using activestate 5.8, and when I try to parallelize this process, it seems that my filehandles are getting horribly crossed when I really don't think they should be. In the end, it would appear that the parent thread's printing to STDOUT is appearing in the child threads filehandle... (?!?)

I hope somebody here can give me the insight I need, in order to be able to get this working...

APPROACH:

The proper solution (rather than ImageMagick, which is prohibitively inefficient), is:

  1. use tiff2ps to generate postscript (creates large files)
  2. edit the postscript to fix the page-size
  3. have ghostscript turn the PS into a PDF file.
Obviously, there's a good opportunity (and necessity) to take advantage of the multiple processors here.

I've tried using Parallel::ForkManager originally, and (lately) ithreads, and both exhibited the same strange behavior; namely, unrelated filehandles appear to get crossed in different processes... (!!)

CODE:

############################################################ # ConvertTiffsToPS # # Convert a list of tiff files to PS files # # RETURNS: # A list of the paths of the postscript files ############################################################ use threads; use threads::shared; sub ConvertTiffsToPS { my (@tiffpaths) = @_; my @pspaths; # to be filled in. my @threads; foreach my $tiffpath (@tiffpaths){ (my $pspath = $tiffpath) =~ s/\.tiff?$/\.PS/i; push @pspaths, $pspath; print "Converting $tiffpath to $pspath\n"; my $thread = async { Tiff2PS($tiffpath, $pspath) }; push @threads,$thread; } print("Waiting for child processes..."); print join(",\n",map {$_->join()} @threads); print("child processes complete..."); confess "Somethings wrong".Dumper(\@tiffpaths,\@pspaths) unless (scalar(@tiffpaths) == scalar(@pspaths)); # return the list of PS files return @pspaths; } # Take a tifffile, and produce a PS file next to it. sub Tiff2PS { my ($tiffpath,$pspath) = @_; #local $| = 1; # has no visible effect either way... print qq("$TIFF2PS_COMMAND" "$tiffpath" |\n); open TIFF2PS, qq("$TIFF2PS_COMMAND" "$tiffpath"|) or confess "Can't run $TIFF2PS_COMMAND"; open PSOUT, ">$pspath" or confess "Can't open $pspath for writing!\n"; my $flag = 0; foreach my $line (<TIFF2PS>) { print PSOUT $line; # Add the following line only once... if (!$flag && $line =~ /^%%BoundingBox: (\d+) (\d+) (\d+) (\d+ +)/o) { my ($w,$h) = ($3-$1, $4-$2); # Fix the pagesize, since GS wants everything to be 8.5x11 +" portrait print PSOUT "<< /PageSize [$w $h] >> setpagedevice\n"; # Short-circuit prevents expensive regexp match afterwards $flag = 1; } } close TIFF2PS; close PSOUT; if (! -e $pspath) { warn("**TIFF2PS Problem: $!"); confess "Tiff2PS Error: $!"; } else { print "$pspath created...\n"; } return $pspath; }

PROBLEM:

While dozens of tiff files end up being converted correctly, at least one of them always seems to look like this:

Converting c:/4-1-13/00000003/00000014.TIF to c:/4-1-13/00000003/00000 +014.PS %!PS-Adobe-3.0 EPSF-3.0 %%Creator: tiff2ps %%Title: c:/4-1-13/00000003/00000013.TIF %%CreationDate: Sat Sep 03 16:41:24 2005 %%DocumentData: Clean7Bit ...

Note that the first line of this PS file contains a line of output from the PARENT THREAD (which is clearly not PostScript), that was printed to STDOUT... WTF???

When I do the same thing with Parallel::ForkManager, I see the same result (though it appears to hang interminably on occaision). If I use Parallel::ForkManager with 0 forks (for debugging), it works just fine...

What's going on here? How can I avoid this craziness?

Thanks in advance for your advice...

--Georgi

Replies are listed 'Best First'.
Re: win32, threads, and filehandles, oh my!!
by BrowserUk (Patriarch) on Sep 03, 2005 at 23:15 UTC

    Try using lexical filehandles instead of globs.

    open my $TIFF2PS, qq("$TIFF2PS_COMMAND" "$tiffpath"|) or confess "Can't run $TIFF2PS_COMMAND"; open my $PSOUT, ">$pspath" or confess "Can't open $pspath for writing!\n";

    Also, if your not using shared vars there is no need for threads::shared.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: win32, threads, and filehandles, oh my!!
by GrandFather (Saint) on Sep 04, 2005 at 00:11 UTC

    Try using threads->create in place of async:

    print "Converting $tiffpath to $pspath\n"; # my $thread = async { Tiff2PS($tiffpath, $pspath) }; my $thread = threads->create (\&Tiff2PS, ($tiffpath, $pspath));

    Perl is Huffman encoded by design.

      For future reference. async() is just a procedural alias for threads->new()

      ## From threads.pm sub async (&;@) { unshift @_,'threads'; goto &new }

      And create() is just an alias for new()

      *create = \&new;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: win32, threads, and filehandles, oh my!!
by davidrw (Prior) on Sep 04, 2005 at 00:03 UTC
    Is this a one-time task? If so, can you just do poor-man's multitasking and (for sake of discussion i'll assume you have 3 processors) just start 3 perl processes, each covering a different third of the files to convert?
    (though it seems like you're really close w/the much cooler threaded solution -- but just in case it comes down to a time crunch and you have bust out the sledgehammer for the square peg....)