Re: Hitting memory limit? (ActiveState Perl)
by davido (Cardinal) on Jan 22, 2004 at 05:25 UTC
|
It's possible that you're encountering the practical limits of using flat files and in-memory hashes. Perhaps its time for a database approach. With a proper (and simple) database and some redesign of how you're doing things, you would never need to hold all users in memory at the same time, and wouldn't have to worry about running out of memory, even as the user list grows.
Without seeing actual code, it's difficult to give any more specific tips, but in general, when you start running out of memory it's time to reconsider your design philosophy. Often this means taking that step into the world of databases.
The DBI module, and DBD::SQLite can help to ease the pain of such a transition.
Update: Corrected typo where DBD::SQLite was referred to as DBM::SQLite
| [reply] |
|
|
In terms of moving to a database, but still staying within Perl, are the aforementioned DBI and DBM:SQLite the "best" options? Are there others that I should look at as well, to make sure that I'm using the interface / dbase that will work best for my particular use?
| [reply] |
|
|
A database solution allows for fast "random" read/write access to data. This can assist you in overcoming the need to build large in-memory hashes. This may be only one solution, but can be a powerful and highly expandable one.
Your logfiles could even be written to databases, though that's probably less important than being careful to not try to slurp too much in at a time from your logfiles as you parse them. The key is to parse, and write out to the database without building up gigantic hashes of usernames and other data.
To this end, DBI and DBD::SQLite are very good options. They will require that you learn some SQL. But DBD::SQLite is such a powerful (albiet lightweight) database that you are likely to not need anything heavier (such as Oracle) for a long time. DBD::SQLite is the database driver, and the database server wrapped into one module. DBI.pm is the "Database Interface" which allows you to write code in a database independant way, rather than getting down to the nitty-gritty of dealing directly with a low level database driver. You use the two together: DBI as the API to DBD::SQLite. I do consider them to be very good options if you're looking for ways to scale your project beyond its current capacity.
| [reply] |
Re: Hitting memory limit? (ActiveState Perl)
by BrowserUk (Patriarch) on Jan 22, 2004 at 06:56 UTC
|
You've given minimal information in your script, but it sounds like your hitting the swapper.
You neglect to mention how much memory is is use when the slow-down occurs, which is probably as important as the cpu utilisation. The "Physical Memory" and "Commit Charge" figures from the Perfromance tab of the Task Manager are your best indicators.
Alternatively, if you have Win32::API::Prototype installed, you could use this script to log memory use along side your ticker, and get an indication of the memory used -v- performance.
#! perl slw
use strict;
use Win32::API::Prototype;
my $memoryStatus = AllocMemory( 32 );
ApiLink( 'Kernel32', 'GlobalMemoryStatus', 'P', 'V' );
GlobalMemoryStatus( $memoryStatus );
my @keys = qw[
length PercentageLoad
TotalPhysical AvailablePhysical
TotalPaged AvailablePaged
TotalVirtualMemory AvailableVirtualMemory
];
my %mStatus;
@mStatus{ @keys } = unpack 'V8', $memoryStatus;
printf '%30s : %.2f ', $/,
$_, $mStatus{ $_ } / 2**20
for @keys[ 1 .. $#keys ];
__END__
P:\test>gMem
PercentageLoad : 0.00
TotalPhysical : 510.98
AvailablePhysical : 360.27
TotalPaged : 1250.05
AvailablePaged : 908.16
TotalVirtualMemory : 2047.88
AvailableVirtualMemory : 2018.61
| [reply] [d/l] |
|
|
On one test box (this is my laptop, I took a copy home with me):
Physical Memory
-----
Total: 261560
Available: 7000 (and dropping)
System Cache: 58088 (and climbing)
Commit Charge (K)
-----
Total: 381188 (pretty static)
Limit: 630652 (very static)
Peak: 395832 (very static)
Kernel Memory (K)
-----
Total: 43932 (very static)
Paged: 35876 (very static)
Nonpaged: 8056 (very static)
I watched it until Physical Memory:Available got all the way down to ~3000. At that point Physical Memory:Available jumped to ~20000, and at the same moment Physical Memory:System Cache jumped by ~1000. Then they both started dropping/climbing as before.
I will run with your Win32::API::Prototype script and post results at the first available opportunity (Friday, likely).
| [reply] |
|
|
Your Total Commited Charge far exceeds your Total Physical Memory--by about 50%. You've been into swapping for a considerable time.
Although the TCC is fairly static, suggesting that the hash isn't growing much, each time you access a key within the existing hash, it's quite possible that that perl would have cycle through the entire hash structure to locate the next value, which in turn could mean the OS having to swap a huge portion of the process' image.
You probably had trouble hearing your mp3 over the sound of the disc thrashing--your nieghbour's probably had the same problem:)
Perls hashes are pretty memory hungry, and if you are nesting them, filling memory doesn't take too much effort. I induced this machine, with 512MB ram into swapping in 80 seconds with a hash containing a little under 6_000_000 keys with empty (undef) values. If your values are only simple scalars you'll get there much quicker. If they are themselves hashes or arrays, much quicker still.
sub rndStr{ join '', @_[ map{ rand @_ } 0 .. shift ] };
$! = 1;
( $_ % 1000 or printf( "\r$_ : %s ", times ) )
and $h{ rndStr 8, 'a'..'z', 'A' .. 'Z', 0 .. 9 } = undef
for 0 .. 10_000_000;
5858000 : 80.703
You could find out how many keys it takes to induce swapping by disabling swapping and logging scalar keys %your_main_hash periodically as you fill it. If the last number displayed before you get perls "Out of memory" error is not to far short of your expected final size, then it might be worth looking at seeing how you could use less memory. If it is a long way short, then that effort is almost certainly not worth it.
There are some alternatives to hashes which can save memory depending on the nature of your keys and what use you are making of the hash (see A (memory) poor man's <strike>hash</strike> lookup table. for one possible, part solution), but if you are regularly dealing with these volumes of data, a DB of some form is probably your easiest option.
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re: Hitting memory limit? (ActiveState Perl)
by Vautrin (Hermit) on Jan 22, 2004 at 12:58 UTC
|
You're processing 2GB of Data, and it seems like you're storing whatever you're looking for in memory. Even assuming you only keep 1 / 10th of the data you process, you're still looking at 200MB+ in memory. Does your computer have that much free?
Dan
| [reply] |
|
|
The initial test computer has 1GB RAM. The additional test computer (my laptop) has 256MB. Curiously, both start thrashing at approximately the same point in processing.
I'm definitely going to follow up on the previous suggestions to try and narrow down the scope of the problem, and reduce the size of the overall data collection to make sure the basic script structure isn't a problem.
| [reply] |
Re: Hitting memory limit? (ActiveState Perl)
by neuroball (Pilgrim) on Jan 22, 2004 at 04:52 UTC
|
You gave a nice subscription of the how the computer reacts to your script. One problem though: To help you we need to take a look at your code
Please edit your node, and add the script between <code> tags, so that the monks might help you with their knowledge, not their abbility to guess what might be wrong, or what might be re-written more efficiently
/oliver/
| [reply] |
|
|
I hesitated to post code before because it's several hundred thousand lines (well, that and some of it I wrote on no sleep and I know blows chunks, aesthetics-wise ;). Would the preferred Monk-friendly approach be:
(a) Post a URL
(b) Post a snippet showing just the loop control and what I think are relevant portions (which may not, in fact, be relevant)
(c) Something else entirely
I am, as soon as I have a a system available to me, testing on a Linux box as well for comparison purposes. I will also be testing paring down the tallying piece and re-testing to confirm that the problem does not occur unless the hash is growing, and then add bits back in a bit at a time to see whether I can identify anything specifically problematic.
| [reply] |