bratwiz has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl Objects, Internal Representation -- How??
by jimt (Chaplain) on Sep 15, 2006 at 13:14 UTC | |
Boy, this is a huge can of worms. The short answer is "Your mileage may vary." Perl, in one of the finest (I think) examples of giving you ridiculous power and then forcing you to deal with it allows any arbitrary reference to be an object. So scalarrefs, coderefs, arrayrefs, hashrefs, globrefs, whatever. All sorts of fancy things. But, it's up to you, the programmer, to determine what data is stored where and how much. Each object is self contained. So, for the simple example of a hashref, it behaves like a normal hashref. If you've set a key in a particular object, then the key & value are stored in that object. They don't appear in any other object. Keys that you haven't set don't appear. Note - I don't know enough about the internals to know if perl does things like optimize constant strings "some_key" so they only appear once in memory. One of the more low level guys may be able to comment. Methods are usually not stored on a per object basis, but they can be if you want to (using closures or simple coderefs, for example). Typically, they're defined once in the class and all objects of the class (or subclasses!) just point to that one method. But if you want to hang the method off your object, you can. This is something I'd file under "Do it if you need it and know what you're doing." Arrays have trickier memory requirements, since Perl's arrays aren't sparse (unless they've changed out from under and I missed the announcement). So if you have 100 "slots" in your array-based object, and populate slots 1 & 100, then slots 2-99 are allocated as well (in some capacity, again, I don't know the details of the internals). Note that they're empty, but the slots are still there. I wrote up some other issues over at Problems I've had with array based objects. And then there are hipper newer things, like inside out objects, which use scalarrefs to key into internal hash tables in the class to store the attributes. So each object has a scalar ref as the object itself, but all sorts of additional data stored in the class level hashes. I guess you can just remember that if you stick it in the object, it's stored in the object. If you don't, it's not. But the devil's in the details. | [reply] |
|
Re: Perl Objects, Internal Representation -- How??
by BrowserUk (Patriarch) on Sep 15, 2006 at 13:44 UTC | |
-- How?? Perhaps a better question would be "-- Why?" :) As in: Why do you need to know this information? The answers very much depend upon how you decide to construct your classes. And how you decide to construct your classes depends upon the choices you make. Some of those choices could be influenced by the design requirements of your classes. Re: Making an Object faster? may shed a little light on some of the possibilities--though it is due for an update. Beyond that, it would possibly be more fruitful for you to post an outline of your requirements, you may then get better information regarding the trade-offs. And whatever implementation you use, there will always be trade-offs. Some people critisise Perl's DIY object construction as too flexible, or too daunting, but remember that if Perl opted for a single mechanism, we programmers would be stuck with whatever choice was made. It may be that one choice could be made that would satisfy 80% or more of peoples needs, but the other 20% would be stuck with that decision. As it is, the programmer can make their selection of class implementation to satisfy it's specific requirements, even if this means trading (say) less imposed "security" for a lighter foot-print and faster exectution. Conversely, in those environments where security is paramount, you can go the other way and accept slower performance in favour of having a shotgun on the door to keep ouut intruders. We'd all like it if there was a single mechanism that ran so fast we could measure it; used no memory; trapped all attempts to subvert the defined interface; allowed for single-, multi- and trait-style inheritance; but you'd still be making trade-off somewhere. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by bratwiz (Sexton) on Sep 16, 2006 at 20:43 UTC | |
"Why" is simple and should be obvious-- I have an application that uses a lot of object instances and I am looking to optimize it and improve its efficiency in terms of both memory and execution speed. However, I can and will be happy to elaborate on my application a bit... I have been tasked to write (and in some instances, re-write existing) programs to process very, very large log files from across a number of systems and derive detailed "stats" (info) from the logs. These systems are in heavy use 24/7/365. I have examined the problems from a number of angles and have developed a number of strategies for handling the job. The most obvious and useful one is to crunch the stuff offline on another unencumbered machine. One of the older programs (which I didn't write) from a much older system often has problems bumping up against the 2GB limit built into the perl binary. Unfortunately I am not permitted to compile up a new binary, nor do I have access to a binary with a larger limit. And the system has reached "end-of-life" in every possible, conceivable measure of the term, and is on "ICU life support" and we're just hoping it can make out the year without problems... so the likelyhood of actually fixing that one is somewhere between positive zero and negative zero... In the new initiative... in my earlier attempts, I sometimes ran up against the 2GB limit with the new programs too. However through careful (re-)consideration of the data and limiting myself to only the barest essential data that I need from each log entry, I have been able to cut back considerably on the in-memory requirements and haven't hit the limit in awhile. One of the things that compounds my efforts is that I have to combine logs from multiple systems (i.e. a sequential stream) in order to develop complete records. And there are multiple systems (wide) at each point in the path. So in order to follow a trail for a particular item, I have to follow the chain of systems, whichever ones they may be. And that ultimately takes quite a bit of crunch time to sift through all the data. In the end-- all this will be easier. I am developing a database that we will ultimately use to just write the data into in the first place and then it should mostly just "fall out" the right way when queried. Until then however, management has deemed it important (nay, critical :) that I crunch the data, develop the stats, and generate reports __before__ I work on putting it into the database (where, ironically, generating the reports and such would become virtually trivial... ...Did I mention how much I _love_ my job??? :) I am nearing the end of this first phase. Things are working out pretty well. But I have been curious of late just how well perl stores objects-- how effectively and efficiently. What optimizations it performs. Are there any attempts to cut down and reduce redundancy and overlap in the keys (since that could add up to a blivet of memory all by itself!) And etc-- what else I can do to squeeze out space and cycles. I have also been recently reading and schooling myself in the various manners in which one could create objects-- arrays, scalars, etc. and have read a bit about inside-out objects, but don't see how they really reduce the cost of memory. All-in-all, I very much like Perl's ability to do what I want-- to make the simple things simple and the hard things possible... (I think I've heard that somewhere... :) But there are times that I wish it had a mechanism to permit hard-typing and struct-based objects. | [reply] |
by chromatic (Archbishop) on Sep 17, 2006 at 02:54 UTC | |
Without knowing anything about your application beyond what you've posted here (and I admit, I skimmed a bit), I would rather optimize the handling of multiple "very, very large log files" than worry about object representation. This sounds more like a working set problem to me. If you can do line- or chunk-at-a-time preprocessing into a small database (I like SQLite), you can likely reduce your working set size and forget almost everything about needing to manage it. I've used that successfully in what I believe to be a similar situation. | [reply] |
by BrowserUk (Patriarch) on Sep 18, 2006 at 06:16 UTC | |
"Why" is simple and should be obvious You think? I am looking to optimize it and improve its efficiency in terms of both memory and execution speed. In general, Perl tends to trade memory for speed, so how to optimise for both is rarely obvious and never easy. But there are times that I wish it had a mechanism to permit hard-typing and struct-based objects. Using a scalar-ref based object implementation, where the referenced string is a pack'd representation of the data can reduce memory consumption by 90% or so. For example, a typical log file line might contain a couple of ip address/port numbers; a datetime stamp; a protocol string; a process name and another couple of short text fields. Stored as hash-based objects, each field named and the string stored as the value, 1e6 objects consumes ~1GB of ram. The same data pack'd to a single string, and it's reference blessed, the storage requirement drops to < 100MB. But, you pay a penalty for unpacking the data when accessing it. Done right, the penalty doesn't have to be huge, but there is a penalty. Nothing's for free.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by Anonymous Monk on Sep 21, 2006 at 08:59 UTC | |
If your goal is to write a program that is optimized for both speed and memory, Perl is a horrible choice of language. There are a million reasons to choice Perl to program in, but it seldomly is the best choice if execution speed needs to be optimized (although usually its execution speed is good enough). And it's never the appropriate choice when it comes to optimize the memory footprint. Compared to say C, Perl is slow (but that has nothing to do with compiled vs interpreted - it all has to do with the flexibility Perl gives you. Flexibility comes with a price). And while perl is skillfully tweaked to do as many optimizations as possible, it usually does so by requesting more memory from the OS. | [reply] |
|
Re: Perl Objects, Internal Representation -- How??
by jdtoronto (Prior) on Sep 15, 2006 at 13:49 UTC | |
The big issue to deal with is that their are just SO many ways to create and manage objects in Perl. My rather limited investigation finds that blessed hashes behave somewhat like ordinary hashes - have all the same autovivification behaviour. What I did was to write a simple piece of code that watched memory usage as I instantiated the same class a number of times. As expected the memory usage is directly related to the hash size involved and has basically nothing to do with the code. Although, as pointed out before, this need not always be the case, it's up to you. Hashes with dissimilar numbers of keys seem to populate with different memory usage. When I did the same for the class re-written as an 'inside-out' (per TheDamian in Perl Best Practices) the memory usage was a little higher for each instance and the benchmark times were a little higher, but not so much as to be prohibitive. I won't publish them because I am still very much a neophyte at the OOP game and I am sure that my methodology is quite likely flawed. That being said, it is a journey that needs to be made so that I understand this stuff better. My guides have been "Object Oriented Perl" by TheDamian as well as his "Perl Best Practices". "Higher Order Perl" by Dominus has some material, as does Chapter 5 of "Perl Hacks" by TheDamian, chromatic and Ovid. There is also a lot here in the Monastery, especially in postings by AbigailII. Happy hunting, jdtoronto | [reply] |
by xdg (Monsignor) on Sep 15, 2006 at 17:23 UTC | |
You might be interested in these OO benchmarks or in Anti-inside-out-object-ism for a comparison of some of the leading issues. jdhedden has worked very hard to optimize Object::InsideOut for speed, particularly in the default array-based mode. (By contrast, I've optimized Class::InsideOut for simplicity and thedamian has optimized Class::Std for complex class hierarchy management.) -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk. | [reply] |
|
Re: Perl Objects, Internal Representation -- How??
by perrin (Chancellor) on Sep 15, 2006 at 15:08 UTC | |
Is a perl object stored in its entirety more than once (data, methods and all)? Just the data. The methods go in the symbol table for the class. What about hash keys- are they duplicated for each object of the same type? Yes. What about hashes (objects of the same type) that have asymmetrical numbers of keys)-- ie., not fully populated? Doesn't matter. Nothing is shared between objects of the same type except the fact that they are both marked as belonging to the same class. Other than that, they don't even have to the same data type. What about objects based on hashes versus arrays??? It doesn't matter. All the answers are the same. | [reply] |
by dave_the_m (Monsignor) on Sep 15, 2006 at 18:07 UTC | |
What about hash keys- are they duplicated for each object of the same type?Er, no. Hash keys are generally shared across all hashes, as the following shows. Using the same key for lots of objects uses less memory than a different key for each one:
Dave. | [reply] [d/l] |
by perrin (Chancellor) on Sep 15, 2006 at 18:30 UTC | |
| [reply] |