in reply to Re^2: Efficient way to handle huge number of records?
in thread Efficient way to handle huge number of records?

Hm. You're stretching several boundaries beyond their limits there:

The 32bit memory mapping system supports either 4GB of address space without PAE or 64GB with PAE. But that does not necessarily tell you how much Linux supports with/without PAE.

Linux also introduces constraints on total physical memory based on interactions with the way it manages kernel virtual memory. That leads to at least four different levels of memory support based on choices made during kernel build.

The lowest level is 896MB without PAE
The next level is about 3.25GB (bios limited) without PAE
The next level is, I think, about 16GB, with PAE

The highest level, I think, is the full 64GB with PAE plus an ugly kludge in kernel virtual memory (I think a bad idea. Use 64bit instead for that much ram).

Win32 can also (and I believe was first) to do Page Address Extension (PAE). It can also extend the default 2GB user space to 3GB per process. But just like linux, these limits are extended through a series of cludges that have drawbacks as well as benefits.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

  • Comment on Re^3: Efficient way to handle huge number of records?

Replies are listed 'Best First'.
Re^4: Efficient way to handle huge number of records?
by Marshall (Canon) on Dec 11, 2011 at 12:30 UTC
    Yes, even Windows 32 bit XP can go to 3GB for a user, but there can be problems if that feature is enabled.

    This appears to be a job for a DB, if many searches will be performed after the DB is "built" (initialized and indexed).

    From the problem statement, I think that SQLite will do the job just fine. At such time that it does not, then the SQL will work on another DB.

      This appears to be a job for a DB, if ...

      See Item 3. The "if" is crucial.

      Personally, I can see no logic at all in running a 32-bit OS on 64-bit hardware.

      32-bit Perl on 64-bit OS has one advantage -- at least in the Windows world -- that of more XS modules build successfully. But even on windows, the stuff that doesn't tends to be either abandon-ware or the weird esoteric stuff like POE and Coro which either will never work with Windows or despite themselves if they do.

      But for the most part, a 64-bit build of Perl on a 64-bit OS with 8/16/32/64GB of ram just makes doing anything involving the huge datasets that typify genomic work so much easier.

      When you can pick up 8GB of ram for £29, it makes no sense to try and squeeze the large datasets typified by genomic work through 2 or 3GB memory pools.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I think that this XS module build problem on 64 bit machines is going to get solved. It is inevitable. Evolve or die. Like all evolutionary things, it takes some time.

        If the OP's data fits into RAM, then why the heck not? I have no problem with that. A hash table and RAM access is gonna beat any kind of DB hands down!

Re^4: Efficient way to handle huge number of records?
by flexvault (Monsignor) on Dec 11, 2011 at 18:14 UTC

    BrowserUk

    I don't have any real experience with 32bit Linux machines with more than 2GB of memory, so your knowledge in this area is gospel. The specs said it could, but I didn't realize it was a hack.

    I do have more experience with 32/64bit Unix machines, and found that most 64bit applications (including Perl) required more than twice the real memory as on a 32bit Unix. Now, I haven't checked this in quite a while, so I'll have to revisit that. It could have been the 64bit version of the compiler or something else. I will be installing an IBM power 7 p-series p740 with 128GB in January. I will try some benchmarks and let you know. Power 7 is only 64bit.

    Thank you

    "Well done is better than well said." - Benjamin Franklin

      I will be installing an IBM power 7 p-series p740 with 128GB in January.

      12 (or 16) cores, 48 (or 64 threads) 3.7(or 3.55)GHz. Drool! Slobber! :)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        BrowserUk,

        Actually it's a 8 core, 32 threads, 3.7GHz. But it's not for me. I have a few IBM hardware resellers that use my services to install p-series products. This is going into a company to upgrade an oracle installation running on a p640. But its the latest model announced in October ( which doubled everything ). But I agree with you--wish it was for me :-)

        I wouldn't even mind if Perl was 5 times as big, who cares! But I will be able to do some tests on it.

        Anything you want to test?

        Thank you

        "Well done is better than well said." - Benjamin Franklin