I was running a simple task, almost one-shot, throw-away script, therefore using basic tools. Because there were thousands of files to process, I decided to parallelize. With small test-suite, and "no-op", SSCCE code, sometimes output is OK:

use strict; use warnings; use feature 'say'; use threads; use Thread::Queue; use CAM::PDF; use File::Find; my $q = Thread::Queue-> new; my @gang = map async( sub { while ( defined( my $f = $q-> dequeue )) { say threads-> tid, ' ', $f; my $pdf = CAM::PDF-> new( $f ) or die; } }), 1 .. 2; find( sub { -f and /\.pdf$/i and $q-> enqueue( $File::Find::name ) }, './1' ); $q-> end; $_-> join for @gang; __END__ 1 ./1/1/106/10627.pdf 2 ./1/1/107/10703.pdf 2 ./1/1/186/18673.pdf 1 ./1/1/209/20946.pdf 2 ./1/1/26/2656.pdf 1 ./1/1/33/3384.pdf 2 ./1/1/57/5742.pdf 1 ./1/1/58/5869.pdf 2 ./1/1/63/6395.pdf 1 ./1/1/70/7099.pdf 1 ./1/1/74/7466.pdf

But sometimes not (example 1, one worker dead):

1 ./1/1/106/10627.pdf 2 ./1/1/107/10703.pdf Thread 2 terminated abnormally: *****Undefined subroutine &Compress::Z +lib::Parse Parameters called at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/lib/Co +mpress/Zli b.pm line 366. 1 ./1/1/186/18673.pdf 1 ./1/1/209/20946.pdf 1 ./1/1/26/2656.pdf 1 ./1/1/33/3384.pdf 1 ./1/1/57/5742.pdf 1 ./1/1/58/5869.pdf 1 ./1/1/63/6395.pdf 1 ./1/1/70/7099.pdf 1 ./1/1/74/7466.pdf

Example 2 (both workers dead, but for different reason):

1 ./1/1/106/10627.pdf 2 ./1/1/107/10703.pdf Thread 2 terminated abnormally: *****Global symbol "@ISA" requires exp +licit pack age name (did you forget to declare "my @ISA"?) at C:/strawberry-perl- +5.28.0.1-3 2bit-PDL/perl/site/lib/Text/PDF/Filter.pm line 342. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 343. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 351. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 374. Compilation failed in require at C:/strawberry-perl-5.28.0.1-32bit-PDL +/perl/site /lib/CAM/PDF.pm line 5608. Thread 1 terminated abnormally: *****Global symbol "@ISA" requires exp +licit pack age name (did you forget to declare "my @ISA"?) at C:/strawberry-perl- +5.28.0.1-3 2bit-PDL/perl/site/lib/Text/PDF/Filter.pm line 342. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 343. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 351. Global symbol "@basedict" requires explicit package name (did you forg +et to decl are "my @basedict"?) at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/sit +e/lib/Text /PDF/Filter.pm line 374. Compilation failed in require at C:/strawberry-perl-5.28.0.1-32bit-PDL +/perl/site /lib/CAM/PDF.pm line 5608.

Actually, CAM::PDF, "as is", is coded to issue a single warning (large source file!), but with filter undefined it becomes somewhat broken and useless and floods terminal with further thousands of warnings, therefore I prepended that line with

die '*****' . $@;

so the output is as shown above. My impression is that threads are trying to read the same source files -- CAM::PDF requires Text::PDF::Filter, which requires Compress::Zlib, and hence some sort of race condition happens and failed (partial) reading from file.

Is that even possible? I thought that files can be opened for reading safely by different processes, and OS would "arbitrate" "parallel" access to them. Is it not the case in general, or with require only?

If it's not the case, then is it a common knowledge (which I missed) that main thread should take care to "pre-require" all modules possibly needed by several workers before spawning them?

(Note, if someone wants to run tests: PDFs are of "compressed xref table" variety (they are client's files I won't share), and with other (simple xref table) files the sub containing line 5608 won't be called, i.e. Text::PDF::Filter won't be required, on simply reading a file).


In reply to Why isn't this code thread-safe? (Is "require" thread-safe??) by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.