in reply to Re: File Size limits
in thread File Size limits

halley: "If you're really trying to access files larger than 9 petabytes (9*10^15 =~ 2^53) directly, using a scripting language, then I'd like to rent that time machine you're using."

Ha! I need this info for my CPAN distro, Search::Kinosearch. Since it's used to index collections of documents, the files which make up the index can grow quite large. I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.

Currently, Kinosearch uses a couple DB_File tied hashes, and those work fine for files > 2 GB, but I'm replacing them with more specialized file formats using native Perl code, and I'd like to know how big I can allow these new files to grow. I'll probably store pointer data as a pair of packed network ints and multiply.

If 32-bit Perl can handle integers up to 2**53 accurately, not in native format but in the mantissa of the double, I'm golden. 9 petabytes will suffice.

Thanks,
-- Marvin Humphrey

Replies are listed 'Best First'.
Re^3: File Size limits
by blokhead (Monsignor) on Aug 01, 2005 at 16:56 UTC
    I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.
    You may want to look at my module File::LinearRaid which lets you access multiple (sequential) files seamlessly using a single filehandle. It was conceived to help seamlessly overcome OS filesize limiations (among other things).

    One of the ideas I had with F::LR was that you could have an enormous logical file split into reasonably-sized physical files and use BigInts as (logical) seek offsets. Since the underlying (physical) seeks would still be "reasonably" sized, it should work.. in theory! Unfortunately, I'm still stumped as to how to test this out. In fact, what I just outlined may even work in the module's current state -- I just don't know.

    Also, right now there is no mechanism to automatically grow the logical file, although there is a manual mechanism to append physical files to the big logical file.

    Anyway, if you think this module could work for you, let me know. I'd be happy to hear your feedback and suggestions.

    blokhead

      I'd actually checked out your module already. :) It looks like a nifty solution to a vexing problem.

      For Kinosearch, I'm content to have systems which can't deal with large files run up against the file size limit. If you're running a search engine where the index exceeds 2 Gb, you're running a serious search engine, and you're probably not running it on a machine that can't deal with >2 Gb files. My main reason for writing the workarounds was a perhaps mistaken impression as to what maximum file size 32-bit Perl's internal file manipulation routines can handle safely.

      WRT to problem of auto-growing a file, I was doing it by using a scalar write buffer. That stopped me from having to check if I was exceeding the 2Gb limit every single call to print(). Perhaps a similar solution could work for your module?

      -- Marvin Humphrey