Re^2: Perl Objects, Internal Representation -- How??

"Why" is simple and should be obvious-- I have an application that uses a lot of object instances and I am looking to optimize it and improve its efficiency in terms of both memory and execution speed.

However, I can and will be happy to elaborate on my application a bit...

I have been tasked to write (and in some instances, re-write existing) programs to process very, very large log files from across a number of systems and derive detailed "stats" (info) from the logs. These systems are in heavy use 24/7/365. I have examined the problems from a number of angles and have developed a number of strategies for handling the job. The most obvious and useful one is to crunch the stuff offline on another unencumbered machine.

One of the older programs (which I didn't write) from a much older system often has problems bumping up against the 2GB limit built into the perl binary. Unfortunately I am not permitted to compile up a new binary, nor do I have access to a binary with a larger limit. And the system has reached "end-of-life" in every possible, conceivable measure of the term, and is on "ICU life support" and we're just hoping it can make out the year without problems... so the likelyhood of actually fixing that one is somewhere between positive zero and negative zero...

In the new initiative... in my earlier attempts, I sometimes ran up against the 2GB limit with the new programs too. However through careful (re-)consideration of the data and limiting myself to only the barest essential data that I need from each log entry, I have been able to cut back considerably on the in-memory requirements and haven't hit the limit in awhile.

One of the things that compounds my efforts is that I have to combine logs from multiple systems (i.e. a sequential stream) in order to develop complete records. And there are multiple systems (wide) at each point in the path. So in order to follow a trail for a particular item, I have to follow the chain of systems, whichever ones they may be. And that ultimately takes quite a bit of crunch time to sift through all the data.

In the end-- all this will be easier. I am developing a database that we will ultimately use to just write the data into in the first place and then it should mostly just "fall out" the right way when queried. Until then however, management has deemed it important (nay, critical :) that I crunch the data, develop the stats, and generate reports __before__ I work on putting it into the database (where, ironically, generating the reports and such would become virtually trivial...

...Did I mention how much I _love_ my job??? :)

I am nearing the end of this first phase. Things are working out pretty well. But I have been curious of late just how well perl stores objects-- how effectively and efficiently. What optimizations it performs. Are there any attempts to cut down and reduce redundancy and overlap in the keys (since that could add up to a blivet of memory all by itself!) And etc-- what else I can do to squeeze out space and cycles.

I have also been recently reading and schooling myself in the various manners in which one could create objects-- arrays, scalars, etc. and have read a bit about inside-out objects, but don't see how they really reduce the cost of memory.

All-in-all, I very much like Perl's ability to do what I want-- to make the simple things simple and the hard things possible... (I think I've heard that somewhere... :) But there are times that I wish it had a mechanism to permit hard-typing and struct-based objects.

Comment on Re^2: Perl Objects, Internal Representation -- How??

Replies are listed 'Best First'.
Re^3: Perl Objects, Internal Representation -- How?? by chromatic (Archbishop) on Sep 17, 2006 at 02:54 UTC
Without knowing anything about your application beyond what you've posted here (and I admit, I skimmed a bit), I would rather optimize the handling of multiple "very, very large log files" than worry about object representation. This sounds more like a working set problem to me. If you can do line- or chunk-at-a-time preprocessing into a small database (I like SQLite), you can likely reduce your working set size and forget almost everything about needing to manage it. I've used that successfully in what I believe to be a similar situation.	[reply]
Re^3: Perl Objects, Internal Representation -- How?? by BrowserUk (Patriarch) on Sep 18, 2006 at 06:16 UTC
"Why" is simple and should be obvious You think? I am looking to optimize it and improve its efficiency in terms of both memory and execution speed. In general, Perl tends to trade memory for speed, so how to optimise for both is rarely obvious and never easy. But there are times that I wish it had a mechanism to permit hard-typing and struct-based objects. Using a scalar-ref based object implementation, where the referenced string is a pack'd representation of the data can reduce memory consumption by 90% or so. For example, a typical log file line might contain a couple of ip address/port numbers; a datetime stamp; a protocol string; a process name and another couple of short text fields. Stored as hash-based objects, each field named and the string stored as the value, 1e6 objects consumes ~1GB of ram. The same data pack'd to a single string, and it's reference blessed, the storage requirement drops to < 100MB. But, you pay a penalty for unpacking the data when accessing it. Done right, the penalty doesn't have to be huge, but there is a penalty. Nothing's for free. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Perl Objects, Internal Representation -- How?? by Anonymous Monk on Sep 21, 2006 at 08:59 UTC
I am looking to optimize it and improve its efficiency in terms of both memory and execution speed. If your goal is to write a program that is optimized for both speed and memory, Perl is a horrible choice of language. There are a million reasons to choice Perl to program in, but it seldomly is the best choice if execution speed needs to be optimized (although usually its execution speed is good enough). And it's never the appropriate choice when it comes to optimize the memory footprint. Compared to say C, Perl is slow (but that has nothing to do with compiled vs interpreted - it all has to do with the flexibility Perl gives you. Flexibility comes with a price). And while perl is skillfully tweaked to do as many optimizations as possible, it usually does so by requesting more memory from the OS.	[reply]