Others have said something like this, but I thought I'd throw my two cents in and try to nail it on the head.

On the face of it, since the text you seek is compressed, random-access seeking is not possible. To seek 150K into the compressed file is meaningless in the context of the uncompressed text. Seek 150K into the compressed file, uncompress it, and maybe you're 300K into the uncompressed text, or maybe 1.5Mb. You have to uncompress a bunch of it AND THEN do your seeks. Consider what BrowserUK quoted in the first reply to your post:

If file is open for reading, the implementation may still need to uncompress all of the data up to the new offset. As a result, gzseek() may be extremely slow in some circumstances.

In other words, some, or most, or all of the file must be uncompressed before you can do your random seeks -- and that's for each call to gzseek()! Performance will suck heavily.

In your responses, you've made it clear that you're still looking for something that will do random seeks into a compressed file. So I repeat, IT'S NOT POSSIBLE. It's just a parlor trick. Whatever module you find or write will either uncompress the file a little at a time to find what you're looking for, or will uncompress the whole file and search through that.

If you can't get away from the size of the file, you might consider rethinking your approach. Can you turn the process around? Can you uncompress the file a block at a time, and then process the phrases as you read them, rather than seek each phrase separately in the file (which it sounds like you want to do)? If you really, truly have to search the whole file for each phrase, the fastest solution is probably to uncompress it yourself (keeping the compressed version so you don't have to re-compress it), do whatever you're doing, and then delete it.

My two cents.

--marmot

In reply to Re: gzseek for perl filehandles by furry_marmot
in thread gzseek for perl filehandles by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.