in reply to Problem opening file

The first thing I would do is add $^E to the error message:

open my $fhLock, ">", $lockfile or die "Failed to open $lockfile: $! +[$^E]";

That may give you a more platform significant message relating to the error.

The second thing I would do is ensure that my threads don't terminate (die) from trappable errors.

How I would go about that would depend upon the structure and usage of the thread procedure; to advise further, I'd need to see the code.

I'd also like to see several examples of the pathnames that have failed.

Unaltered; I'm assuming that server & share in the error message posted are italicised because they've been redacted for security reasoning?

If so, and if you have a few samples of the failing paths available, and if actually necessary, continue to obscure them, but do so in a way that doesn't destroy any patterns in the information. Eg. if the server names are engineering1, engineering2 etc. Then change them to dept1 and dept2. etc.

I guess I can short-circuit that by asking: is it the same server(s) or share(s) that keep failing; but historically, there are often clues in real information that gets lost in such redactions.

In a very generic way, unless $lockfile is a shared variable (I hope not), then there is little scope for 'threads' to be the problem here.

The more likely cause is coincidental concurrency at the file system level (which would equally occur if you were using separate processes). That prediction is strengthened by the very name of the variable.

To resolve that would require cross-concurrency logging. Ie. each thread or process would need to log its file system activity (at least opens & closes; possibly reads and writes also), in a timely (strictly chronological) order.

Without seeing the code and structure of the application it is hard to advise on that; but one method is to have a queue (eg.Thread::Queue) and another thread.

Your work threads log their activity by writing to the queue, and the logging thread simply loops over that queue writing the messages out to a common log file.

Anyway, that's all mostly speculation. If you want more help, I'd need sight of the code.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

Replies are listed 'Best First'.
Re^2: Problem opening file
by SimonPratt (Friar) on Jul 09, 2015 at 16:42 UTC

    Thanks for responding, BrowserUk, this is really helpful.

    $lockfile is definitely not a shared variable

    Yes, I did redact the server and share names. The boss requires that we continue to redact this type of information for security reasons. I can tell you that the server name is the same in every instance, but the share name has been different each time. Here are all of the examples I have in my log files:

    20150509-stderr.log:Thread 6 terminated abnormally: Failed to open \\< +i>PRODSERVERA</i>\<i>BEYC</i>\INQUEUE\loadqueue.lck: Invalid argument + at service.plx line 596. 20150523-stderr.log:Thread 7 terminated abnormally: Failed to open \\< +i>PRODSERVERA</i>\<i>FRT4</i>\INQUEUE\loadqueue.lck: Invalid argument + at service.plx line 596. 20150704-stderr.log:Thread 10 terminated abnormally: Failed to open \\ +<i>PRODSERVERA</i>\<i>JPGE</i>\INQUEUE\loadqueue.lck: Invalid argumen +t at service.plx line 596.

    The logging is being handled by a Thread::Queue (actually by two - one for STDOUT and one for STDERR), so all of the log entries are in chronological order, although I don't capture specific file system activity at the moment.

    In each case, the pattern around what happened is different. In the case of BEYC, the faulting thread loaded files for BEYC, then did a whole bunch of other recipients for the next 12 hours, then crashed on the very next BEYC file to come in. For FRT4, it loaded a bunch of files, with the last file being loaded using a library call and crashing out on the very next file that needed to load. For JPGE, it again loaded a bunch of files successfully, but the last file loaded was passed out to a new Perl instance in a system call before crashing on the next file.

      Okay. Try adding $^E to your logging; it might make for a clearer picture.

      It would be easier to suggest something if you posted the thread code, but at a very minimum I'd rewrite that open something like this:

      my $retries = 5; my $fhLock; while( $retries and not open $fhLock, ">", $lockfile ) { warn "Failed to open $lockfile: $! [$^E]"; sleep 1; --$retries; }

      And then wait until it happens again. That should tell you whether its a temporary, transitory problem or not; and perhaps shed more light on the cause.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

        Thanks for your suggestions, I've rewritten the open call as per your and marinersk's suggestions and will also add $^E to the output.

        I'll have a chat with the boss today as to how much of the code I can share with you and try to get another update in after lunch

Re^2: Problem opening file
by SimonPratt (Friar) on Aug 24, 2015 at 08:29 UTC

    Hi, BrowserUk

    Just wanted to say thanks for your suggestion to add $^E to the output. The error has occurred again and the additional information was completely unexpected, however it is totally understandable and I can now close out the issue.

    Just for posterity, the full error (with $^E output in bold) is "Failed to open \\servername\sharename\INQUEUE\loadqueue.lck: Invalid argument An unexpected network error occurred".