v.t. To read a large data file entirely into core before working on it. This may be contrasted with the strategy of reading a small piece at a time, processing it, and then reading the next piece.I am writing this meditation as I have had a recent experience with some application code that slurped an entire 3,700,000 rows worth of database objects. This caused the database server to crash, having been denied additional memory from the operating system.
The database crash resulted in overnight callouts for a week (to me, as I was on call ):. Restarting the database, the application was able to continue running - in this case, because the DB had lost and rebuilt its cache, so had a smaller memory footprint.
We realised that the code just wasn't scalable. Especially as the number of objects in the source database always grows, and is already at 3.7 million.select * from foo into linked list of object pointers foreach object pointer Retrieve the object Write its contents to a temporary file in .csv format fork a bcp command to load the data into Sybase
The problem was compounded by the locking scheme - as the source database did not know what was going to happen to the objects, it took out a lock on each one - 3.7 million locks! This is why the DB server was crashing instead of the application segfaulting.
The solution we arrived at was to use a database cursor, and retrieve the objects in batches. Also, we knew to specify no locking when retrieving the objects - hence no lock problem. Finally, the objects were properly released and garbage collected (this was C++) before retrieving the next batch.
The golden rule is to always think about how many rows you are going to get back from your query. If you are comfortable about holding all these rows in memory, then by all means slurp. For very large tables, you are much better off using fetchrow to retrieve the data a row at a time (I am not aware of a mechanism for retrieving multiple rows as batches - but that would be nice).
Also consider how much work is being done by the database, and how much by the application. Consider doing most of the work in joins and where clauses - this way, the database server gets to do, and to optimise, most of the work.
Once again, if the file is small, slurping is OK.
Application code runs inside callbacks. It is desirable for every callback to return as quickly as possible. Delays here result in a noticeable degradation to response time and usability.
Needless to say, Tk applications need to read files. For this purpose is Tk::fileevent, which arranges for a callback to get called whenever a file handle becomes readable or writable. In this way, avoid slurping the file; instead, read a line using fileevent (which is retriggered if there is another line of text in the file).
When it comes to executing operating system commands, Tk::IO (wrongly named in my opinion) can be used to manage the forking, and the capture of the output.
--
I'm Not Just Another Perl Hacker
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: It's bad manners to slurp
by Abigail-II (Bishop) on Apr 29, 2004 at 12:33 UTC | |
by rinceWind (Monsignor) on Apr 29, 2004 at 13:02 UTC | |
|
Re: It's bad manners to slurp
by samtregar (Abbot) on Apr 29, 2004 at 15:07 UTC | |
by mpeppler (Vicar) on Apr 29, 2004 at 15:26 UTC | |
by samtregar (Abbot) on Apr 29, 2004 at 15:31 UTC | |
by tantarbobus (Hermit) on Apr 29, 2004 at 16:50 UTC | |
by eserte (Deacon) on Apr 29, 2004 at 17:12 UTC | |
| |
by samtregar (Abbot) on Apr 29, 2004 at 17:06 UTC | |
by mpeppler (Vicar) on Apr 29, 2004 at 16:03 UTC | |
|
Re: It's bad manners to slurp
by zentara (Cardinal) on Apr 29, 2004 at 19:56 UTC | |
|
Re: It's bad manners to slurp
by ambrus (Abbot) on May 02, 2004 at 16:02 UTC | |
by BrowserUk (Patriarch) on May 02, 2004 at 17:31 UTC |