Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

This is Off Topic so I apologize upfron yet this is one of the best resources I know for solid technical answers. I have a large database, just over 350GB, which we process with Perl DBI and an Oracle backend. I want to improve the I/O performance with the purchase of a new server. We usually run batch jobs right now.

I know software but I'm weak with the hardware side. What are the important HW factors to consider? Don't let cost be a big limit on the suggestions. My budget is just under 40k.

Are there any great forums for this type of question?

  • Comment on OT: What Hardware is important for large I/O bound processes

Replies are listed 'Best First'.
Re: OT: What Hardware is important for large I/O bound processes
by wazoox (Prior) on Jul 28, 2005 at 17:41 UTC

    Databases are very I/O intensive. 350 GB is really small by today hard-drive standards (there are 250 and 400GB drives available). A NAS (Network attached storage) would certainly perform badly, unless you use a very high end one (NetApp FAS900).

    If your need is only :"moderate storage capacity, extreme I/O power, security", I'd go for a fibre-channel enclosure with FC or SCSI drives (NOT SATA drives, like the low-end SANs propose!) and I'll configure it as a RAID-10 for high-performance and security. For instance 6x150GB 15000RPMs drives will be fine and very (very very) fast, and expandable.

    Be very careful, most vendors today propose low-end FC-to-SATA RAID enclosures (like Apple X-Serve RAID, that Oracle's buying by truckloads), they're really great for file storage but not that great for database if you don't fill them up to throat with drives (14) and cache (2GB). But a filled-up X-Server RAID sizes 6 TB, not 350GB...

    See what's the best bang-for-the-bucks. Sure an X-Serve RAID isn't much more expensive than a small FC enclosure, it's much bigger BUT far slower. However it may be enough for you (we don't know where you're starting from, perhaps are you using 5 years old disks with extra-low performance by today measures?) and will provide HUGE storage capacity.

      Thanks for the great detailed suggestions! The 350GB is just the largest database schema. We have a few schemas; in raw diskspace have around 4-6 TBs on SANs, individual disks and a NAS (12 GB cache) but the hardware is old.

      From a purely I/O throughput perspective which is better? Having a single powerful database server with 8 SCSI drives? Or getting a new low-end server with PCI-X and a powerful SAN?

      Again thanks for the detailed tips!

        Well, get the best hardware money can buy and it should be OK :)

        OK, from a ROI perspective, low-end servers like supermicro or Dell with a dual-Xeon, a FC or SCSI board and an external drive enclosure with FC or SCSI drives is probably "good enough" for most database works (one of my customers uses a similar machine with a 9TB database)

        In case you may need more storage, Apple X-Serve Raid is great but as I mentioned, be prepared to fill it if you want to do serious database work. HP, EMC and other have similar products but I didn't try them so I can't really compare.

Re: OT: What Hardware is important for large I/O bound processes
by astroboy (Chaplain) on Jul 28, 2005 at 19:05 UTC

    The problem with SANs in many organisations is that you just get allocated space in the larger scheme of things. Depending on the architecture, it can be hard to specify exactly where your data goes, so you don't have the chance to optimise its layout. You want to spread your data across as many disks as possible in order to parallelise the I/O, so I'd go for local drives because you have comlete control over them

    The sad thing is that disks are getting bigger, but it's better to have lots of smaller ones rather than a few (or even one) big one(s). You'd either stripe at the hardware level or lay your data out manually. Genterally you'll need to know your data hotspots - which you or your DBA can find out through the v$ tables, or via Enterprise Manager.

    The other thing to consider is that Oracle has partitioned tables, so you can spread you can spread the hotspots around the disks, cos in most cases, there are a couple of tables that bear most of the I/O brunt

    Finally, most I/O bottlenecks can be improved by closely examing your database schema, tuning your SQL and finally look at your hardware. Hardware considerations come last. Unlike coding, premature optimsation through db design is absolutely essential, as it's almost impossible to refactor the design once it's gone live. Also use the v$ tables to identify your worst performing SQL I/O-wise - they can often be tuned through judicious choice of indexes, optimizer hints etc.

    Finally, you don't need a super-grunty server, just as many disks as you can afford!

      Finally, you don't need a super-grunty server, just as many disks as you can afford!

      Amen to that. This would be why most big iron systems only ship with a very small number of surprisingly lowly clocked CPUs. I heard of one system managing a 0.5TB *active* database (ie, not just a warehouse) with only 3 processors clocked at 150Mhz.

      Of course, you would end up with a lot of system time without a decent IO controller. So, best to stick with the tried and tested solutions - which means high-end SCSI or FC-AL.

      And whatever you do, don't use any RAID level other than raid 1, or maybe 1+0.

      $h=$ENV{HOME};my@q=split/\n\n/,`cat $h/.quotes`;$s="$h/." ."signature";$t=`cat $s`;print$t,"\n",$q[rand($#q)],"\n";
      I agree completely with this; the hardware part is the final piecec to the puzzle.

      We've tunined the SQL and other processes as much as possible. The performance we're getting matches our collective experience. It still impresses me how changes to the SQL can cut processes from days to a few minutes.

      We're not finding more places to tune unless we partition the tables, which will be an interesting experiment. Your comment about the SANs in organizations struck a cord with me.

      The standard process at the company is to take a SAN and format it to work like a single disk or LUN. The IS group claims their's no benefit to keeping things in individual disks since the SAN's cache will handle it. I've long had doubts on this. Do the v$ tables show the hotspots even on a single SAN disk?

        Do the v$ tables show the hotspots even on a single SAN disk?

        Not really. Oracle has no idea about the disk abstration behind the scenes, and that is the problem - how do you match an Oracle datafile to the disks in the SAN?

        One of my mates had a huge SAN performance issue at a large site in the UK. They had all the SAN vendor's leading experts working on the problem (even flew in the guru from the US), and everyone was scratching their heads. As a test they set the system up with a single Linux-based PC with dedicated disks. Under load tests it flew. Now I'm not saying that SANs are no good, but they add a huge level of complexity, and you need very sharp in how you architect your solution.

Re: OT: What Hardware is important for large I/O bound processes
by NetWallah (Canon) on Jul 28, 2005 at 16:44 UTC
    Sounds like it is time to get a SAN or NAS like the Entry-level EMC CLARiiON AX.

    No - I'm not a sales-droid. SANs make sense when large amounts of fast storage are needed - they make backups easier, and replicas possible.

         "Income tax returns are the most imaginative fiction being written today." -- Herman Wouk

Re: OT: What Hardware is important for large I/O bound processes
by Anonymous Monk on Jul 29, 2005 at 12:18 UTC
    There are a lot of things to consider. What's important is how is the data being used, and how often. More reads than writes? Large sequential accesses, or a gazillion short reads/writes all over the place? What's your backup solution? Do you have a need for business copies? And if so, do you need to be able to split of a mirror (almost) instantaneously? How many boxes need to see the disks that form the database? That is, does Oracle run in a cluster, and if so, of how many nodes?

    Things that will influence your I/O performance:

    • FC vs. SCSI.
    • If you have FC, the fabric and the switches involved.
    • Number of controllers that can access the data.
    • Other traffic going over the same controllers.
    • Your storage box (from just a bunch of disks to large storage devices like an EVA or an XP)
    • Size of all the caches and buffers involved.
    • Number of spindles.
    • RAID settings.
    • Setup of your hot/warm spares. (This, together with the RAID settings may have an extra performance impact when one or more disks have failed)
    • Size of the disk and its frequency (rpms).
    • Synchronous or asynchronous access.
    • LVM settings.
Re: OT: What Hardware is important for large I/O bound processes (Thanks!)
by Anonymous Monk on Jul 28, 2005 at 23:12 UTC
    Thanks everyone! This was incredibly helpful.