If you are on Unix, I'd say just test it if you have a local filesystem that you don't mind if it gets full and it supports large files (i.e. > 2GB in size). Most unix filesystems will create a "sparse" (aka holey) file, without actually allocating all of the space. Just create a file, seek way the heck out there, and write one byte. It should only really be one disk block long, but appear in a directory listing as huge. I think this is true for NFS as well, but it may depend on the implementation.
| [reply] |
You're referring to the last integer that Perl will not convert to a floating point number, and thus lose the least significant bits and the ability to increment the integer meaningfully.
I don't know the exact answer. A simple boolean search would find it for you. The question is somewhat academic, as the chances of exceeding it in practice are quite rare. Databases are the only practical files of that size, and you'd use native code for the actual access.
If you're really trying to access files larger than 9 petabytes (9*10^15 =~ 2^53) directly, using a scripting language, then I'd like to rent that time machine you're using.
Lastly, if you put [] around your http: link, it would be rendered directly clickable.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] |
halley: "If you're really trying to access files larger than 9 petabytes (9*10^15 =~ 2^53) directly, using a scripting language, then I'd like to rent that time machine you're using."
Ha! I need this info for my CPAN distro, Search::Kinosearch. Since it's used to index collections of documents, the files which make up the index can grow quite large. I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.
Currently, Kinosearch uses a couple DB_File tied hashes, and those work fine for files > 2 GB, but I'm replacing them with more specialized file formats using native Perl code, and I'd like to know how big I can allow these new files to grow. I'll probably store pointer data as a pair of packed network ints and multiply.
If 32-bit Perl can handle integers up to 2**53 accurately, not in native format but in the mantissa of the double, I'm golden. 9 petabytes will suffice.
Thanks,
-- Marvin Humphrey
| [reply] |
I've been crafting workarounds which open new files every time you approach 2GB, but that's getting tiresome.
You may want to look at my module File::LinearRaid which lets you access multiple (sequential) files seamlessly using a single filehandle. It was conceived to help seamlessly overcome OS filesize limiations (among other things).
One of the ideas I had with F::LR was that you could have an enormous logical file split into reasonably-sized physical files and use BigInts as (logical) seek offsets. Since the underlying (physical) seeks would still be "reasonably" sized, it should work.. in theory! Unfortunately, I'm still stumped as to how to test this out. In fact, what I just outlined may even work in the module's current state -- I just don't know.
Also, right now there is no mechanism to automatically grow the logical file, although there is a manual mechanism to append physical files to the big logical file.
Anyway, if you think this module could work for you, let me know. I'd be happy to hear your feedback and suggestions.
| [reply] |
halley: "Lastly, if you put [] around your http: link, it would be rendered directly clickable."
Thanks, done.
-- Marvin Humphrey
| [reply] |