Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Poor randomness with File::Temp and fork().

by BazB (Priest)
on Jun 27, 2004 at 16:47 UTC ( [id://369992]=perlquestion: print w/replies, xml ) Need Help??

BazB has asked for the wisdom of the Perl Monks concerning the following question:

Greetings once again, fellow Monks.

I have some code which forks a number of children to carry out a number of tasks in parallel.
Each child calls File::Temp::tempfile() to create a temporary file and some (non-perl) code is written to the temporary file and the filehandle closed. An external command is then called to execute the code in the temporary files.

The code regularly fails to create a unique filename - File::Temp croaks after 10 attempts to guess another unique name (the 10 attempts is a constant hardcoded in the module's source).
After rummaging around in File::Temp's guts and inserting some print statements for debugging, it seems that each of children try to use the same strings for the temporary filename(s).

I've worked around this by modifying File::Temp to include the process ID ($$) on the end of any filenames, but I feel this is a less-than-ideal solution.
Security and/or paranoia isn't a primary concern, so having process IDs in filenames doesn't worry me too much.

Could someone explain to me why the randomness of the temporary filenames in multiple processes isn't exactly random and suggest any other solutions/hacks to workaround the problem?

Cheers,

BazB.


If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
That way everyone learns.

Replies are listed 'Best First'.
Re: Poor randomness with File::Temp and fork().
by dws (Chancellor) on Jun 27, 2004 at 17:04 UTC

    After rummaging around in File::Temp's guts and inserting some print statements for debugging, it seems that each of children try to use the same strings for the temporary filename(s).

    Under the covers, the random number generator is used to produce a random set of characters to replace the "XXXX" that you can pass in the filename template. But if you're forking, each child is going to inherit the same starting seed, possibly leading to each child generating the same "random" name. Oops.

    One way around this might be to reseed the random number generator for each child after you fork (using srand(), and passing it the child's process id).

    Another way might be to use the optional "SUFFIX" argument and stick the process id in the suffix. Try something like

    ($fh, $filename) = tempfile($template, SUFFIX => ".$$")
    and see if it works for you. Using the PID as a suffix a fairly standard convention.

Re: Poor randomness with File::Temp and fork().
by Joost (Canon) on Jun 27, 2004 at 16:58 UTC
    Well, as a workaround i'd say something like this should work:
    my tmp = new File::Temp( TEMPLATE => "temp".$$."XXXXX", DIR => '/tmp', SUFFIX => '.dat');
    At least you won't have to modify File::Temp.

    Though I would expect File::Temp to do something similar for me, I'm not sure that it actually guarantees a unique filename for forked() siblings. Maybe you should file a bug report. (don't forget a working example demonstrating the bug)

Re: Poor randomness with File::Temp and fork().
by atcroft (Abbot) on Jun 27, 2004 at 17:00 UTC

    I am not sure this to be the case, but here is what I would suspect is going on. If you are calling srand() before forking (or not calling srand), then the children have the same random seed as they start, and thus will get the same sequence of pseudorandom numbers from rand(), which I suspect is being used to generate the random filename in File::Temp. I would also think possibly calling srand inside the forked child might be a way of getting around this issue.

    Hope that helps.

      If you are calling srand() before forking (or not calling srand) then the children have the same random seed as they start
      Actually, if you don't call srand at all, the children will have different seeds as long as you don't call rand() before the fork() either (since perl 5.004). See perldoc -f srand:
      If srand() is not called explicitly, it is called implicitly at the first use of the "rand" operator.
      In this case, it's probably safest to call srand after the fork() anyway.

Re: Poor randomness with File::Temp and fork().
by Zaxo (Archbishop) on Jun 28, 2004 at 00:12 UTC

    Here's a demonstration of several processes inheriting rand's seed. The srand solution suggested seems necessary if anything seeds rand before fork.

    #!/usr/bin/perl my ($foo, %kid) = rand; for (1..10) { $kid{my $cpid = fork} = undef; defined $cpid or delete( $kid{""}), next; $cpid and next; print rand, $/; exit(0); } delete $kid{+wait} while %kid; __END__ 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209 0.292121652604209
    With srand; inserted as the first thing in each child, we get, say,
    0.458016411009343 0.922388048683391 0.938946679075347 0.358693821662204 0.88636225422761 0.519188953171582 0.459808857952343 0.982476360703917 0.626396972061233 0.824876680345373

    After Compline,
    Zaxo

Re: Poor randomness with File::Temp and fork().
by BazB (Priest) on Jun 27, 2004 at 19:39 UTC

    Thanks for all the replies.

    I must admit, I had overlooked using the template or suffix options that File::Temp offers by default.

    I do have calls to rand in other parts of the code (and calls to File::Temp prior to the fork()s), but I was just a little surprised that exactly the same sequences were occurring across so many processes.

    dws suggests that calling srand in each child will re-seed the rand function - I'll give that one a shot too, although I did find the warning in the perldoc for 5.6.1 a little concerning:

    Do not call srand multiple times in your program unless you know exactly what you're doing and why you're doing it.
    ...and I wasn't completely sure what I was doing :-)
    The docs with 5.8.x aren't quite so stern.

    Cheers,

    BazB.


    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

      I would suggest having the fork()ed children read some data from /dev/random or /dev/urandom and using that, rather than giving them all their own copy of the pRNG in the same state and so producing the same data. Alternatively, <advert>Net::Random</advert>
Re: Poor randomness with File::Temp and fork().
by tachyon (Chancellor) on Jun 27, 2004 at 23:49 UTC

    The real guts of File::Temp can be seen here Re: Avoiding race condition with sysopen and this function should create valid handles in forked kids without issues. As this is 15 lines to File::Temp's 1800 it is easier to see what is going on.

    As always YMMV.

    cheers

    tachyon

Re: Poor randomness with File::Temp and fork().
by metadoktor (Hermit) on Jun 28, 2004 at 19:38 UTC
    If you want true randomness you could buy yourself a generator that connects via USB or you could interface to Fourmilab's HotBits random number generator. Fourmilab is the website of the founder of Autodesk.

    metadoktor

    "The doktor is in."

      Net::Random uses Hotbits as one of its sources.

      But my own local USB randomness device would be very useful - do you have any pointers to one?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://369992]
Approved by Happy-the-monk
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-03-28 18:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found