nglenn has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to speed up some conversions from html to wordXML by using threads. I was hoping it would be fairly straightforward to spawn several threads with Microsoft word running from each one, but my code for threads throws errors:

use strict; use threads; use Win32::OLE; my $threadNum = 2; my $directoryLocation = 'C:\Users\nate\Desktop\websites';# opendir(DIR, $directoryLocation); my @files = grep(/\.html$/,readdir(DIR)); closedir(DIR); my @threads; my $counter = 0; for(1..$threadNum){ my @fileSlice = @files[$counter..$counter+$#files/$threadNum]; push @threads, threads->create(\&convertToWord,@fileSlice); $counter += $#files/$threadNum+1; } $_->join for(@threads); sub convertToWord(){ my $Word = Win32::OLE->new('Word.Application', 'Quit'); #convert some stuff to wordXML here, not shown to save space }

Actually, the line about starting Microsoft word isn't needed to make the program fail. The perl interpreter crashes and prints the following:

Free to wrong pool 25f3778 not 213fd8 during global destruction.

When I say crash, I don't mean that the program just exits early, I mean that a window pops up telling me that the Perl Command Line Interpreter has stopped working. Any suggestions?

Replies are listed 'Best First'.
Re: How to use Threads
by Zucan (Beadle) on Jan 09, 2011 at 03:54 UTC
    Out of curiosity? Do any files get successfully converted? Maybe even just one? My testing shows that the error pops up at the end, as things are being cleaned up.

    Check out the following thread: free to wrong pool while global destruction : windows perl environment

    It looks like Win32::OLE isn't thread-safe, so you have to write your threads accordingly with that knowledge. However, one drawback you will find is that you may not be able to run more than one of these conversions at the same time... since that would likely make worse the non-thread-safe nature of Win32::OLE.

    Good luck!

Re: How to use Threads
by trwww (Priest) on Jan 09, 2011 at 04:38 UTC

    Your operating system is already multithreaded. Your task would be easier suited to using that than the in-language thread support:

    opendir(DIR, $directoryLocation); my @files = grep(/\.html$/,readdir(DIR)); closedir(DIR); my @cmd = qw( start perl c:/html2wordxml.pl ); foreach my $file ( @files ) { system( @cmd => $file ); }

    In other words, make a program that knows how to reformat a single file, and then call that program repeatedly without waiting for the last call to finish.

Re: How to use Threads
by roboticus (Chancellor) on Jan 09, 2011 at 13:22 UTC

    nglenn:

    Have you considered using separate processes (i.e., fork) instead? If you're not trying to share data between the tasks, then starting a new process for each conversion makes sharing problems like this moot. There's even a nice module Fork::Manager that helps you with the bookkeeping of how many jobs you want to run in parallel, when to submit new ones, etc.

    Finally, since you're converting HTML to WordXML, why use Microsoft Word at all? I'm just curious, as I would think that cutting Word out of the loop may speed things up. If I had to do the HTML to WordXML conversion, I think I'd try making a minimal WordXML template, use HTML::Parser to disassemble the HTML document, and spit out the XML. Of course, not having done so, it may be a lot more complicated than I'm guessing. I've done a similar task with Excel once, though, and it was pretty easy. That task took an XML document from a database and whacked it with an XSLT transformation to spit out an HTML document that Excel had no difficulty digesting. (For me, the difficulty was learning enough XSLT to do the job. Actually, I didn't learn it, I just cargo-culted something together.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      (i.e., fork)

      Remember that if the OP is using Word, he his (almost certainly) running on Windows--and on Windows, fork is emulated using threads--so using a forking solution, whether via the built-in fork, or via some module like Parallel::ForkManager, (there doesn't seem to be a Fork::Manager module?), then he would experience exactly the same problems as he is encountering now.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to use Threads
by Gangabass (Vicar) on Jan 09, 2011 at 15:11 UTC

    Try to require Win32::OLE; inside you thread sub.

    sub convertToWord { require Win32::OLE; ...... }