Hello all,

I'm hoping that somebody out there can help me with what's been a real bugbear for the last few days:

BACKGROUND:

I need to write a subroutine that (among other things) turns thousands of TIF files into dozens of PDF files on a 4x intel multi-processor machine running MS2003 server. I'm using activestate 5.8, and when I try to parallelize this process, it seems that my filehandles are getting horribly crossed when I really don't think they should be. In the end, it would appear that the parent thread's printing to STDOUT is appearing in the child threads filehandle... (?!?)

I hope somebody here can give me the insight I need, in order to be able to get this working...

APPROACH:

The proper solution (rather than ImageMagick, which is prohibitively inefficient), is:

  1. use tiff2ps to generate postscript (creates large files)
  2. edit the postscript to fix the page-size
  3. have ghostscript turn the PS into a PDF file.
Obviously, there's a good opportunity (and necessity) to take advantage of the multiple processors here.

I've tried using Parallel::ForkManager originally, and (lately) ithreads, and both exhibited the same strange behavior; namely, unrelated filehandles appear to get crossed in different processes... (!!)

CODE:

############################################################ # ConvertTiffsToPS # # Convert a list of tiff files to PS files # # RETURNS: # A list of the paths of the postscript files ############################################################ use threads; use threads::shared; sub ConvertTiffsToPS { my (@tiffpaths) = @_; my @pspaths; # to be filled in. my @threads; foreach my $tiffpath (@tiffpaths){ (my $pspath = $tiffpath) =~ s/\.tiff?$/\.PS/i; push @pspaths, $pspath; print "Converting $tiffpath to $pspath\n"; my $thread = async { Tiff2PS($tiffpath, $pspath) }; push @threads,$thread; } print("Waiting for child processes..."); print join(",\n",map {$_->join()} @threads); print("child processes complete..."); confess "Somethings wrong".Dumper(\@tiffpaths,\@pspaths) unless (scalar(@tiffpaths) == scalar(@pspaths)); # return the list of PS files return @pspaths; } # Take a tifffile, and produce a PS file next to it. sub Tiff2PS { my ($tiffpath,$pspath) = @_; #local $| = 1; # has no visible effect either way... print qq("$TIFF2PS_COMMAND" "$tiffpath" |\n); open TIFF2PS, qq("$TIFF2PS_COMMAND" "$tiffpath"|) or confess "Can't run $TIFF2PS_COMMAND"; open PSOUT, ">$pspath" or confess "Can't open $pspath for writing!\n"; my $flag = 0; foreach my $line (<TIFF2PS>) { print PSOUT $line; # Add the following line only once... if (!$flag && $line =~ /^%%BoundingBox: (\d+) (\d+) (\d+) (\d+ +)/o) { my ($w,$h) = ($3-$1, $4-$2); # Fix the pagesize, since GS wants everything to be 8.5x11 +" portrait print PSOUT "<< /PageSize [$w $h] >> setpagedevice\n"; # Short-circuit prevents expensive regexp match afterwards $flag = 1; } } close TIFF2PS; close PSOUT; if (! -e $pspath) { warn("**TIFF2PS Problem: $!"); confess "Tiff2PS Error: $!"; } else { print "$pspath created...\n"; } return $pspath; }

PROBLEM:

While dozens of tiff files end up being converted correctly, at least one of them always seems to look like this:

Converting c:/4-1-13/00000003/00000014.TIF to c:/4-1-13/00000003/00000 +014.PS %!PS-Adobe-3.0 EPSF-3.0 %%Creator: tiff2ps %%Title: c:/4-1-13/00000003/00000013.TIF %%CreationDate: Sat Sep 03 16:41:24 2005 %%DocumentData: Clean7Bit ...

Note that the first line of this PS file contains a line of output from the PARENT THREAD (which is clearly not PostScript), that was printed to STDOUT... WTF???

When I do the same thing with Parallel::ForkManager, I see the same result (though it appears to hang interminably on occaision). If I use Parallel::ForkManager with 0 forks (for debugging), it works just fine...

What's going on here? How can I avoid this craziness?

Thanks in advance for your advice...

--Georgi


In reply to win32, threads, and filehandles, oh my!! by georgi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.