in reply to File Size limits

You're referring to the last integer that Perl will not convert to a floating point number, and thus lose the least significant bits and the ability to increment the integer meaningfully.

I don't know the exact answer. A simple boolean search would find it for you. The question is somewhat academic, as the chances of exceeding it in practice are quite rare. Databases are the only practical files of that size, and you'd use native code for the actual access.

If you're really trying to access files larger than 9 petabytes (9*10^15 =~ 2^53) directly, using a scripting language, then I'd like to rent that time machine you're using.

Lastly, if you put [] around your http: link, it would be rendered directly clickable.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re^2: File Size limits
by creamygoodness (Curate) on Aug 01, 2005 at 16:34 UTC
    halley: "If you're really trying to access files larger than 9 petabytes (9*10^15 =~ 2^53) directly, using a scripting language, then I'd like to rent that time machine you're using."

    Ha! I need this info for my CPAN distro, Search::Kinosearch. Since it's used to index collections of documents, the files which make up the index can grow quite large. I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.

    Currently, Kinosearch uses a couple DB_File tied hashes, and those work fine for files > 2 GB, but I'm replacing them with more specialized file formats using native Perl code, and I'd like to know how big I can allow these new files to grow. I'll probably store pointer data as a pair of packed network ints and multiply.

    If 32-bit Perl can handle integers up to 2**53 accurately, not in native format but in the mantissa of the double, I'm golden. 9 petabytes will suffice.

    Thanks,
    -- Marvin Humphrey

      I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.
      You may want to look at my module File::LinearRaid which lets you access multiple (sequential) files seamlessly using a single filehandle. It was conceived to help seamlessly overcome OS filesize limiations (among other things).

      One of the ideas I had with F::LR was that you could have an enormous logical file split into reasonably-sized physical files and use BigInts as (logical) seek offsets. Since the underlying (physical) seeks would still be "reasonably" sized, it should work.. in theory! Unfortunately, I'm still stumped as to how to test this out. In fact, what I just outlined may even work in the module's current state -- I just don't know.

      Also, right now there is no mechanism to automatically grow the logical file, although there is a manual mechanism to append physical files to the big logical file.

      Anyway, if you think this module could work for you, let me know. I'd be happy to hear your feedback and suggestions.

      blokhead

        I'd actually checked out your module already. :) It looks like a nifty solution to a vexing problem.

        For Kinosearch, I'm content to have systems which can't deal with large files run up against the file size limit. If you're running a search engine where the index exceeds 2 Gb, you're running a serious search engine, and you're probably not running it on a machine that can't deal with >2 Gb files. My main reason for writing the workarounds was a perhaps mistaken impression as to what maximum file size 32-bit Perl's internal file manipulation routines can handle safely.

        WRT to problem of auto-growing a file, I was doing it by using a scalar write buffer. That stopped me from having to check if I was exceeding the 2Gb limit every single call to print(). Perhaps a similar solution could work for your module?

        -- Marvin Humphrey

Re^2: File Size limits
by creamygoodness (Curate) on Aug 01, 2005 at 18:14 UTC
    halley: "Lastly, if you put [] around your http: link, it would be rendered directly clickable."

    Thanks, done.

    -- Marvin Humphrey