in reply to seek() functionality on pipes

So long as you only need to go forward and always relative to the start of the file, then just discard as many bytes as necessary to reach the point you want:

use constant CHUNK => 4*1024; sub pseek { my( $p, $o ) = @_; read( $p, my $discard, $CHUNK ), $o -= $CHUNK while $o > $CHUNK; read( $p, $discard, $o-1 ); return $o; }

If you need to do relative or backwards seeks, you're pretty much out of luck unless you can afford to read the whole file into a scalar and then open that scalar as a file:

open MEM, '+<', \$bigscalar or die $!;

In which case you can treat the result just as you would a normal file.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: seek() functionality on pipes
by ikegami (Patriarch) on Jul 21, 2008 at 19:08 UTC

    read is not guaranteed to read the number of bytes you specify, especially when dealing with handles which aren't tied to files. I also added error handling and made the buffer size configurable.

    use constant BLK_SIZE => 16*1024; sub pseek { my( $p, $to_read, $blk_size ) = @_; $blk_size ||= BLK_SIZE; while ( $to_read ) { $blk_size = $to_read if $to_read < $blk_size; my $read = read( $p, my $discard, $blk_size ); return $read if !$read; $to_read -= $read; } return 1; }

    Update: Or maybe not. My testing shows that read does wait, but its documentation uses the same wording as sysread which does not. As such, I wouldn't count on the observed bahviour.

    $ perl -e'$|=1; print "a"; sleep(10); print "b"' | perl -le'read(STDIN +, $buf, 10); print $buf' ab $ perl -e'$|=1; print "a"; sleep(10); print "b"' | perl -le'sysread(ST +DIN, $buf, 10); print $buf' a

    Same results on linux and Windows.

      Yes. Also, depending upon the OPs reqs, it might be better to use sysread rather than read. Most file format specs are in terms of bytes not chars.

      I'm never quite sure whether Perl will start treating input as unicode without a specific request on an open to do so? For example, does it recognise BOMs in an input stream and act upon them?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I'm never quite sure whether Perl will start treating input as unicode without a specific request on an open to do so?

        None of the default PerlIO layers do any conversions except :crlf.

        However, there could be action at a distance (such as from an open pragma).

        That's why binmode should be used on file handles containing binary data (such as the OP's compressed file). It avoids any such problem.

        For example, does it recognise BOMs in an input stream and act upon them?

        :encoding(UTF-16) uses the BOM to determine byte ordering, but something needs to tell open to use :encoding(UTF-16) first.

Re^2: seek() functionality on pipes
by HKS (Acolyte) on Jul 21, 2008 at 18:58 UTC
    The pseek() function is pretty much what I was looking for - thanks. The performance isn't great, but it allows me the flexibility to use whatever compression format I like without having to decompress to a file, read the new file in, and then remove it. Thanks for the help.

      See ikegami's improvements above. Also my comments about using sysread rather than read which still seems to give a substantial performance improvement on my system at least.

      I don't think there is much that can be done about the performance. Increasing the read chunk size probably won't benefit much as you are going to be limited by whatever buffers the system allocates to the pipe--seems to be about 4k on my system.

      One thing that may improve it, even though it is counter intuative, is to insert a brief sleep after each read in the loop. Especially if the read did not return a full buffer.

      If the producing process is slightly slow, then attempting to read again too quickly is pointless, as there may be nothing, or less than a full buffer load available to read, and you could end up reading a few bytes each time with a task switch required in between to permit the producer to produce some more.

      By adding a short sleep, even a sleep 0; may be enough, if a read fails to fill the buffer, could improve throughput markedly. Something to experiment with on the target system and producer program.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.