intel has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, long time listener, first time caller. My question is on how to best throttle/cache a simple HTTP datastream.

I am using  IO::SOCKET::INET to just connect to a port on a machine and stream the data to STDOUT. The values that come out of that stream tick by at unsteady intervals, and I would like to create a buffer/throttle/cache type datastructure to enable me to output them at a very stable, regular interval. Any ideas/modules/methods that you can think of that would make this work? I'm sure someone else has done it, but I'm really not sure how to go about it.

Since I haven't done much network programming with perl, and I couldn't locate any howtos in the usual places, (I may have just missed it) I thought I would ask my question here. Thanks in advance for your time.

Retitled by g0n from 'throttling'.

Replies are listed 'Best First'.
Re: how to throttle/cache a simple HTTP datastream
by merlyn (Sage) on Jul 18, 2005 at 17:54 UTC
    I'd set up a POE adaptor... one side would accept stuff from the HTTP socket, and append it to the end of a buffer, while the other side would tick at regular intervals taking off data from the front and streaming it to the client.

    Hmm, sounds like an interesting column idea, but I'd first need to know why, so I could motivate it (and then write it for you!).

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Thats actually what I was considering doing, based on what I understand POE is capable of, and after reading Rocco Caputo's documentation. The only problem is that I've had the fear struck into me about the complexity of using POE. But if thats really the best way to do it then I definitely would love an excuse to learn.

      The reason for this little system is to capture stock values from a constantly updating stream and send them to a java application from the http stream. There are 2 streams and I'd like to throttle them and use one for primary and the other as an active/passive failover.

      Any recommended POE howtos? I've seen a few but some are far better than others.

Re: how to throttle/cache a simple HTTP datastream
by BrowserUk (Patriarch) on Jul 18, 2005 at 21:48 UTC

    This a pretty easy to do using threads, but any sort of demo requires the answers to a few questions:

    1. What data rates are involved from the source?
      • Peak data rate?
      • Required average data rate?
    2. What order of total volumes are involved?
    3. Is the datastream binary or textual?

      That is, will the consuming process be reading fixed-sized blocks of data or variable length lines?

    4. What should be done in the event of underruns?

      If the throttling process runs out of buffered data and so is unable to sustain the required data rate is this fatal to the consuming process or just inconvenient?

      If the data source can periodically 'dry up' for extended periods, then several strategies are possible depending upon the nature of the consuming process' tolorance to starvation.

      1. A large intermediate buffer.

        Basically, you delay the output until a substantial quantity of data has been buffered before starting to supply the consumer. Ultimately, you read the whole of the source before supplying anything to the sink at the required rate.

      2. A feedback loop.

        You measure the rate at which data is being received over a shortish timescale and adjust the output rate up or down to ensure that even when the data source dries for an extended period, the sink receives something at regular intervals. Even if the total throughput is reduced to a trickle until the source speeds up again.

        Alternatively, the sink may require fixed amounts of data to process, and in the event of the source slowing, it may be better to deliver those fixed size chunks at a slower rate until the source picks up again.

      3. Duplicate 'fill' data.

        For some processes, a steady rate of data may be more important than the actual data itself, in which case it may be better to repeat data in order to sustain the rate when the source dries.

    5. What should be done in the event of overruns?

      A similar set of strategies are available for dealing with short periods of over production by the source.

      1. Extend the buffer.
      2. Increase the rate of output.
      3. Increase the volume of output.

      Ultimately, if the source produces faster than the sink consumes, it may be necessary to discard data periodically to prevent memory exhaustion.

    A description of the application would clarify many of these questions.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.