Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, my first time post here. I would like to find a way to perform seek function both forward AND backward on a compressed gz file. I tried to use Compress::Zlib, but according to the doc it cannot seek backwards: "Provides a sub-set of the seek functionality, with the restriction that it is only legal to seek forward in the compressed file. It is a fatal error to attempt to seek backward." Just a little background on what i want to accomplish. I would like to use binary search on a huge compressed gz file, so therefore I would like to open up a gz file, use seek forward and backward for divide and conquer. Any input would be greatly appreciated. Thanks.

Replies are listed 'Best First'.
Re: seek backward on compressed file
by zwon (Abbot) on Jul 02, 2009 at 19:47 UTC

    I don't think such module exists. In order to start reading data from some position you have to decode file from start, so to seek backward you either should keep all decoded data till current position and this would require a lot of memory, or should for every backward seek request decode file from the beginning and this would require a lot of CPU cycles.

Re: seek backward on compressed file
by mzedeler (Pilgrim) on Jul 02, 2009 at 20:17 UTC

    Since the documentation says thats the module doesn't support backward seeks, it seems that you have provided the answer.

    Assuming that you can only seek forward, its an interesting problem how to search in an efficient manner. For starters, its probably a good idea to check if Compress::Zlib does in fact really seek, or if it just providing an abstraction that pretends to seek, while it decompresses everything underneath the hood. In that case, you're most likely to be better off doing linear search.

Re: seek backward on compressed file
by Anonymous Monk on Jul 02, 2009 at 19:31 UTC
    maybe you should:
    1. ask the author
    2. search another module

    AFAIK, the zlib only supports backward seek on read only mode, see zlib manual, so I don't know why Compress::Zlib don't support backward seek at all.
Re: seek backward on compressed file
by suaveant (Parson) on Jul 02, 2009 at 20:34 UTC
    Assuming you are only searching on a small part of each record you could create a key file and store the keys and their offsets, much like is done in a database.

    If not, then maybe you could keep a subset of keys, maybe 1 out of 100, then you could binary search to the key equal or less than the one you need and seek forward through the 100 records in between... kind of a reasonable trade-off.

                    - Ant
                    - Some of my best work - (1 2 3)

Re: seek backward on compressed file
by Marshall (Canon) on Jul 03, 2009 at 14:19 UTC
    I think that there is a problem with "seeking" in general on compressed files. Very few compression algorithms have the property where you can just "land" in a random place in the file and start reading. One problem usually is the ability to "re-sync". A .WAV file can't do this! If you have a huge .WAV file and somehow you get "lost" in the middle of it...there is no way to recover to find the next segment..or at least not that I know of... Some types of linear data structures have "garbage" that you have to read through, but that "garbage" is actually a sync point.