comment on

After recently getting involved in the always-new-but-ancient recurring thread about compiled languages and efficiency, I thought I'd share an example of how the first-pass guess at 'efficient programming' often isn't.

I have a database (MySQL) with five tables of up to 1.2m rows each. The entire db is just over 1.2gb on the filesystem. This particular mule has MySQL, Apache, and Mozilla all running on the same FreeBSD-4 platform which has 256M of RAM. The deployment system will have more headroom, but it's still important to be careful because swapping out the httpd or mysqld processes would be unacceptable.

The goal is to build (as a background task) a file which will contain the select box options for dropdown menus for the various tables and web pages.

My first approach was to build a query that would return an array reference including each of the ten fields, and then process them separately. MySQL crashed on the second table, telling me it needed 512MB of RAM to process the query.

Okay, fine. My next pass at a solution had me selecting for one field at a time, but it still crashed about halfway through the third table.

TOP is your friend on a BSD system, and my good friend showed me that both my SWAP and my Perl process size were growing.

My next pass had me using system() to run a separate process for each table, but this still slowed to a crawl when it got near the live RAM limit. My program was spending most of its time in the swread state, pushing stuff back and forth to swapspace.

My final pass has me separating each MySQL query into a separate perl process spawned by system(). (Yes, I could have fork()ed, but this is easier and more maintainable.) This seems counterintuitive, but it actually is much easier on the system, and the overhead of process management is negligible in the big picture.

Why did my memory usage go over the top? Even when you undefine variables, the memory used is not returned to the operating system until the process terminates. Thus, a solution with many small queries and processes runs much faster and is much more "ecologically" sound than a solution which would be faster in a "perfect" world with infinite RAM. :D

In reply to The Long Way 'Round... by samizdat

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.