Whats the best way to read in a very large amount of data from an external program
Pipes! See for example perlopentut for some piping examples.
I think not and AFAIK the only way to do this would be to modify the nfdump command to pause execution when blocked by my program (or use files but I prefer not to really). I wonder if its possible to block execution of the external program without modifying it.
Writing to a pipe whose buffer size is exceeded does in fact block, so unless the program that writes stuff takes special care to do non-blocking writes, pipes already do what you want.
Isn't that wonderful?
| [reply] |
Yes, using pipes is the right way. I am doing that quite frequently, for example sorting a huge file with the shell sort command and redirecting the sorted output to a Perl program through a pipe. At the beginning, sort is not writing anything out (it has to process at least in part the whole file before it can start printing out anything); during that phase, the Perl program is just sitting idle, waiting for data to start to come. At a later point, the sort command might be spitting out data lines faster than the Perl program would be able process them, but it is OK because the pipe pauses the sort to enable the Perl program to consume the data at its own speed.
| [reply] |
Ahh I see (and mentally it all falls into place for me)! I don't know why this didn't occur to me before since I've written socket code in perl before and paid attention to blocking (and used non blocking sockets) but never really paid attention to file system blocking. I do remember reading a long time ago that pipes behaved in this way but I have sometimes come across incidents when I have used pipes in the unix shell when i've got an error which my recollection seems to suggest was related to out of memory. But I guess it all depends on both the exact way a pipe is being used and the particular commands which are being piped. Try to pipe a program / command which has to write in non blocking mode to something which does not slurp data in without hesitation will I guess cause an error if the pipe overflows. Use a command / program which uses a blocking write and all will work nicely.
So all I need to do is check my source program is opening the file in blocking mode, and if its not, simply change the opening mode to blocking and 'pausing' should automagically fall into place. I feel a bit of a twerp for having to ask this but hey ho thanks for being gentle!
Cheers, Pete
| [reply] |