rovf has asked for the wisdom of the Perl Monks concerning the following question:

My application basically does a

IPC::Run::run([$someProg,...],'>output.txt');
It does not have any knowledge about what program $someProg is (i.e. the programs to be executed here are not written by me). The app is supposed to run unattended. It is no problem if $someProg takes arbitrarily long to execute, but the following is: If the program happens to loop forever, AND inside the loop writes something to stdout, it simply fills up the disk.

It is OK to limit the size of the output file. For instance, under Unix I would do a ulimit -f before invoking my Perl program, but we are on Windows.

Is there a way to limit the size of a file created, so that Windows would abort a process if it creates a file larger than the limit? Otherwise, is there another possibility to inhibit large files to be created?

I already had the idea to use a pipe,

.... '|perl guarded_tee.pl output.txt'
where guarded_tee.pl would work like a usual 'tee', but would close the pipe after it received more than a certain number of bytes. But aside from that not being a very elegant solution, I don't have enough knowledge about the internals of the Windows piping system, and of the hidden traps which might be lurking. So maybe someone could either point out a different solution, or verify that my 'tee' solution should work without risks in every circumstance.

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: Windows-specific: Limiting output of created files
by BrowserUk (Patriarch) on Apr 03, 2009 at 13:37 UTC

    I don't know of any Windows equivalent of ulimit -f. (ie. per process filesize limits.)

    Your limiting tee idea seems the simplest to implement, but be aware that unless the writing process is checking for failing writes, it may not notice the pipe has gone away.

    Eg: When the second instance of perl below terminates, the first instance blythly continues to write to the pipe. It doesn't cause any memory growth, but it does use a substantial amount of cpu until it decides to stop writing of it's own accord.

    perl -E"say $_ for 1..100e6" | perl -E"while(<>){ $n += length; last if $n > 1024**2; warn qq[$n\n];} +"

    If the first instance was checking for the success of it writes, it would notice the pipe going away:

    perl -E"say or die $^E for 1 .. 100e6" | perl -E"while(<>){ $n += length; last if $n > 1e5; warn qq[$n\n]; }" ... 99990 99996 The pipe is being closed at -e line 1.

    But if you can't change the process that's probably not useful info :(


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The tee could attempt to kill the sender after the limit is reached, or simply refuse to copy to output. This is antisocial behavior but it might cause less mayhem if Windows tries "naive" error handling routines when you close the incoming handle. (ok, maybe I should say "brain-dead")
        The tee could attempt to kill the sender after the limit is reached,

        The problem then is how tee finds out the pid of the producer process?

        or simply refuse to copy to output.

        It could, but then both processes might never end. Even if the producer process will eventually reach a natural end, it will take far longer, because of all the buffering and handshaking involved in the pipe.

        if Windows tries "naive" error handling routines when you close the incoming handle.

        If the producer application is just ignoring write failures, then there doesn't appear to be any 'naive error handling" involved. Indeed, the reason the cpu usage climbs after the pipe is severed, is because the producer application starts running much more quickly. As there is no buffering and handshaking involved, in the absence of any other limiting factors, its write loop just runs much more quickly than if the writes were succeeding.

        The problem is it might never finish.

        The nice thing about the piped-open overseer process is that it covers pretty much every eventuality. Save the possibility that the producer process stops writing, but doesn't close the pipe or terminate, before it reaches the limit specified in the overseer. Of course that can also be handled using a thread or can_read.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Windows-specific: Limiting output of created files
by BrowserUk (Patriarch) on Apr 03, 2009 at 14:09 UTC

    And alternative would be to use a piped open to monitor the output and then you can kill the process when the limit is reached. (Note:command wrapped for posting):

    perl -E" $pid = open IN, '-|', qq[perl -E\"say for 1 .. 100e6\"]; while(<IN>){ $n+=length; print; kill(3,$pid),last if $n>1e5 }" > output.file

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Windows-specific: Limiting output of created files
by cdarke (Prior) on Apr 03, 2009 at 13:03 UTC
    The only way I know of is to implement 'NTFS disk quotas', there are plenty of hits on Google.
    Your piping system should work OK - Windows implements anonymous pipes at the Win32 API level. If the app is writing relatively short text records to stdout with "\r\n" line terminators then I don't see an issue. From the MSDN: "If the anonymous read pipe handle has been closed and WriteFile attempts to write using the corresponding anonymous write pipe handle, the function returns FALSE and GetLastError returns ERROR_BROKEN_PIPE." It is bound to slow things up, but by how much I couldn't say.
      The only way I know of is to implement 'NTFS disk quotas'

      Thanks; but from what I have seen, Disk Quotas limit only the space per volume, not per individual file, so I don't think this would help.

      -- 
      Ronald Fischer <ynnor@mm.st>