in reply to Re: Speeding up Postgres INSERTs
in thread Speeding up Postgres INSERTs
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Speeding up Postgres INSERTs
by JavaFan (Canon) on Jun 18, 2010 at 14:59 UTC | |
| [reply] |
by punkish (Priest) on Jun 18, 2010 at 15:18 UTC | |
JavaFan> 3300 insertions/second comes to 30 seconds for 100K. JavaFan> I find 30 fairly large for "several".Indeed. When I asked the original question, I didn't have the correct number. Now I do, for one instance of the job. JavaFan> But is this number for 3300 separate queries, or a single JavaFan> query with 3300 rows?I am not sure what you mean by "a single query with 3300 rows." I mean $sth->execute() is called for each row, and a $dbh->commit() is called once every 100_000 rows. In the end, I get i rows inserted in j wallclock seconds (via Benchmark), and i / j is equal to approx. 3300. Other answers --
--
when small people start casting long shadows, it is time to go to bed | [reply] [d/l] [select] |
by JavaFan (Canon) on Jun 18, 2010 at 16:58 UTC | |
I am not sure what you mean by "a single query with 3300 rows." I mean $sth->execute() is called for each row, and a $dbh->commit() is called once every 100_000 rows.As I explained earlier, for many database, there's can be significant difference between creating a insert statement that inserts a single row, and calling execute() for each row, or a single statement that inserts thousands† of rows in one shot. Apparently, you haven't tried out that variant. As for the other questions, they are important if you need maximum speed. Giving me the answers is useless; you'd have to consider it all. They are important to determine where you bottleneck is. Disks? The channels to your disks? Memory? CPU? But this all goes way beyond a web based Perl forum. If it is rewriting the entire table everytime then it is a very silly dbSays you. There are storage engines that keep rows in primary key order. That's not silly. This makes searching on primary keys fast, and getting ranges fast as well. It entirely depends on the application how one wants a database tuned. It often makes sense to tune a database for fast retrieval, even if it makes inserts slower. Of course, 'rewriting' doesn't mean it's writing the entire table for each row. A page will get reordered. A full page may be split into two. How "hot" is the table? Don't know what that means.How much of the table is cached somewhere? (In-process pages, filesystem buffers (assuming you have the additional filesystem overhead), disk controller caches, etc). What's the physical layout of where the data is written to? Don't know what that means.Are you writing to a single disk? Multiple disks? SAN? Mirrorred disks? RAID? What RAID? Are you striping? How's your data storage connected? Copper? Fiber channel? If you have a 10 TB database, this matters. A lot. Specially since you've insignificant memory compared to size of the database. You may have to try out several setups to determine what works for you. You said diskspace is cheap. Sure, raw diskspace is cheap. Getting a 10 TB database to perform well may not be so cheap. There's a lot more to database that just having lots of disk space and a bunch of SQL statements. †The more the better. How many rows you can fit in a single statement is usually delimited by the maximum package size. | [reply] |