Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Unexpected output from fork (Win32)

by Jenda (Abbot)
on Aug 09, 2004 at 12:51 UTC ( [id://381204]=note: print w/replies, xml ) Need Help??


in reply to Unexpected output from fork (Win32)

$| only affects the output. That means input is still buffered. So when the first thread executes the my $cname= <IN> it reads not only the firs line but the first 4KB and next time it reads the line from its cache, not from the disk. It seems that in your case you were lucky and the 4KB chunks ended at the newlines but I don't think you should not take that for granted. If I try your script on a file generated by

open OUT, '>', 'forklist.txt'; print "computer$_\n" for (1..10000); close OUT;
I do get results like:
...
r1835,checked by -2196
computer1836,checked by -2196
computer1837,checked by -2196
...
computer1965,checked by -2196
computer1966,checked by -2196
computer196ter1250,checked by -4140
computer1251,checked by -4140
computer1252,checked by -4140
...
computer1673,checked by -3928
computer1674,cheputer2713,checked by -3496
computer2714,checked by -3496
...
computer2843,checked by -3496
computer2844,checked by -3496
computeomputer2128,checked by -3120
computer2129,checked by -3120
computer2130,checked by -3120
...

Actually the way you use the $| it only affects STDOUT! Even the OUT handle is buffered! You'd better

use FileHandle; ... OUT->autoflush();
That way you know what handle is unbuffered, $| looks like it is something global which it's not. It affects only the currently select()ed output handle!

You need to change your code to

  1. read the input file only in one thread
  2. flock the output filehandle before writing to it (and set the autoflush correctly)

You may either read the first $no_of_chunks/$no_of_threads into an array, spawn the first child, empty the array in parent, read the next chunk, ... or read the file by the main thread and send the server names to the threads via pipes or Thread::Queue or shared variables or ...

Update (2 minutes after submit) : BrowserUk was quicker :-)

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
   -- Rick Osborne

Replies are listed 'Best First'.
Re^2: Unexpected output from fork (Win32)
by maa (Pilgrim) on Aug 09, 2004 at 14:25 UTC

    thanks for your informative reply, Jenda

    Yes - I see why my $|++ shouldn't work after reading perlvar again (although adding it did get rid of duplicate entries). I couldn't, however, find any mention of the size of the input buffer used by <> or readline() - they both simply promise to return/read up to the next $/ (or EOF) when evaluated in scalar context. Can you point me to the apt document, please? I (wrongly) assumed that, as the seek pointer is shared that I'd get whole records, but you've disproved that :-)

    I tried using an array containing all the input already but that has its own problems when you use fork() - perhaps it's time I tried to use threads; :-) Then I can share the array.

    - Mark

      A couple of things.

      First, as Win32 pseudo-forks are threads, you can (apparently) use threads::shared to share an array (or other data) between them:

      From threads::shared POD:

      DESCRIPTION

      By default, variables are private to each thread, and each newly created thread gets a private copy of each existing variable. This module allows you to share variables across different threads (and pseudoforks on Win32). It is used together with the threads module.

      Though I admit I've never actually tried this.

      Second. Doing the equivalent of your OP code using threads is much simpler.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      It seems the seek point is shared, but the caches are not. Which IMHO doesn't make sense. Either the handles should be completely separate or they should share the cache.

      I did not mean to share the array. The main thread would read the first tenth of the computer names into an array and fork() off a child, the child would have a copy of the array and would start processing those servers. In the meantime the main thread would empty its copy of the array, read the next tenth and spawn another child. And so forth.

      Of course this means that you will have the complete list of computer names in memory, which may and may not be the best thing to do.

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

        There is a good explanation for why the file handles are shared but the buffers are not. The buffering is done by Perl, in the IO layer, so each process gets its own memory and buffers. The low-level read is done by the OS, and it shares the file handle. There is a cache in the OS, but that doesn't have any behavior effect on the processes, other than the read being faster.

        Hi, - I've tried that approach (very similar) already but found that some of the operations would inexplicably fail after about 200 fork()s. I was doing 10 forks() at a time then waitpid()-ing them however Windows NT task manager seemed to think that the perl process was 'leaking' handles tho I couldn't track this leak down.

        The 'threads' count in task manager went up and down exactly as expected as did 'handles' for a short while, then it started to go down by less so the number of open handles started growing.

        I'm never going to need to (realistically) check logs on more than 5000 workstations at once so I could try dividing the total number by 10 and try slicing the array for each fork since I think forking too much might be one of the problems. My alternative is to try ithreads.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://381204]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-29 13:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found