kr123 has asked for the wisdom of the Perl Monks concerning the following question:

I've got a PERL script that is getting its given data on STDIN generated by another program. My PERL script takes varying amounts of time to process each line of data while the program writing to my script is very finicky and requires that its data be written out quickly.

Is there some way to increase the size of the STDIN buffer (or the buffer of filehandles in general) to accomodate this or must I create an intermediary program to act as the buffer?

  • Comment on Increasing Buffer size of STDIN on PERL Program

Replies are listed 'Best First'.
Re: Increasing Buffer size of STDIN on PERL Program
by sgifford (Prior) on Sep 09, 2003 at 22:45 UTC
    Sure you can.
    #!/usr/bin/perl -w use strict; use FileHandle; use constant BUFSIZE => 12000; use vars qw($buf); $buf = "A" x BUFSIZE; STDIN->setvbuf($buf,_IOFBF,BUFSIZE) or die "Couldn't setvbuf\n"; while (<>) { print "This is a line.\n"; }
    produces:
    $ strace -e read -f perl /tmp/t101 </etc/passwd ... read(0, "root:x:0:0:root:/root:/bin/bash\n"..., 12000) = 1482 This is a line. ... read(0, "", 12000) = 0
Re: Increasing Buffer size of STDIN on PERL Program
by Zaxo (Archbishop) on Sep 09, 2003 at 22:42 UTC

    The libc function setvbuf() does what you want. Perl exposes that function in the IO::Handle module. There is a note in the docs saying that setvbuf is not imported by default in Perl 5.8.0 and later, because of PerlIO's bypassing of stdlib.

    You should test for whatever portability matters to you.

    After Compline,
    Zaxo

Re: Increasing Buffer size of STDIN on PERL Program
by Roger (Parson) on Sep 10, 2003 at 00:08 UTC
    Try the following code....

    { local $/; # turn off input separator $input = <>; }
    This reads everything from <STDIN> into the variable $input in one go. I believe this is the quickest method of reading input.
      Hi O Monks, Changing $/ looks a cool way of getting your data in. However are there any occasions where you could run into problems?. I notice you use "local", would "my" be even safer within those brackets?. Thank you for your reply.
        The reason why I use local $/ instead of my $/ is the difference between how local and my work.

        The keyword local has been around for a long time, way before the keyword my (which was introduced only after Perl 5).

        The difference is that local is run-time scoping, while my is compile-time scoping. Local $/ will first push the value of current $/ onto the stack, and then when the code leave the scope, automatically restores its previous content. my $/ will not work here, it will give you a perl compilation error, any $-punctuation variable will have to be localized with local, it's a limitation (or feature?) of perl. local keyword reuses a variable already exists in the global context.

        This method is not efficient if you have lots of data (hundreds of Mb) pumping out from another process, because it will have to allocate memory to hold the input.

      This is only advisable if the input data will fit into the available memory without compromising other processes at the same time.
      You don't want the system to start paging heavily or exhaust memory just for the sake of having all data in memory at once.

      Slurping files can sometimes be advantageous.
      Other times using read() or sysread() to read large blocks of data might do the trick.
      Personally I still find myself using a simple  while (<FH>){...} for most jobs, from logfiles to multi-gigabyte datasets.

      Cheers,

      BazB


      If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
      That way everyone learns.

Re: Increasing Buffer size of STDIN on PERL Program
by Abigail-II (Bishop) on Sep 09, 2003 at 22:29 UTC
    That would be the buffer size of the pipe you are using. What the limit of the buffer is, and how to modify it is system dependent. For some OSses, it might require a reboot, or even a kernel rebuild.

    Oh, and it's Perl, P-e-r-l, just one capital.

    Abigail

Current Solution
by kr123 (Novice) on Sep 10, 2003 at 12:40 UTC

    Thanks for all the quick responses.

    Unfortunately slurping the data won't work in my application since the data is a neverending stream -- I'm getting data straight from syslog.

    I followed the initial suggestions of using setvbuf and found that on my Solaris 8 machine with Perl 5.6 my maximum usable buffer size is 10,240 characters.

    Though this isn't as large as I'd have liked (something bordering 500K would be nice), I found that the average length of my incoming syslog messages is 136 characters, thus a buffer of this size allows me to capture 75 lines in the buffer.

Slurp it all
by jonadab (Parson) on Sep 10, 2003 at 03:08 UTC

    Will the whole thing fit in memory? Just read from the filehandle in list context...

    my @input = <STDIN>; for (@input) { process_line($_); }

    Now your Perl script can take as long as it likes to process the data; the other program has finished and probably exited, but so what?


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/