Re^2: Constructing a hash - why isn't my regex matching anything

Replies are listed 'Best First'.
Re^3: Constructing a hash - why isn't my regex matching anything by Corion (Patriarch) on Dec 19, 2010 at 10:49 UTC
Yes. Use less memory. Look over your program where you are needlessly wasting memory. Maybe you are reading a complete file into memory instead of processing it line by line. Maybe you are doing something else that wastes memory. Even still, 125000 lines is not much, so most likely you are doing something that wastes a lot of memory.	[reply]
Re^4: Constructing a hash - why isn't my regex matching anything by perl_mystery (Beadle) on Dec 19, 2010 at 10:56 UTC
I did look at my code(below),I am hardly doing anything other than constructing the hash `#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash; open my $fh, '<', $ARGV[0] or die "could not open $ARGV[0]'' $!"; while (my $line = <$fh>) { #print "line:$line\n"; my ($key) = $line =~ /;([^;]+)\s-\s/; #print "KEY:$key\n"; my ($value) = $line =~ /\.\\(.*)-\d+\;/; #print "VALUE:$value\n"; if (!($hash{$key})) { $hash{$key}=$value; } } open my $hash, '>', "hash_flf.txt"; print Dumper(\%hash); close $hash;` [download]	[reply] [d/l]
Re^5: Constructing a hash - why isn't my regex matching anything by Anonyrnous Monk (Hermit) on Dec 19, 2010 at 11:36 UTC
How much memory do you have available? The following test case on my system uses 34 MB (according to top) after having filled the hash, and a total of 110 MB after having created the dump string. `#!/usr/bin/perl -w use strict; use Data::Dumper; my $key_ = '//programfiles/documents/data/lookup/script_auth_pap.h'; my $val_ = 'root\edit\perl\scripts\scripths\sec\inc\script_auth_pap.h' +; my $c = 0; my %hash; for (1..125000) { my $key = "$key_$c"; my $val = "$val_$c"; $c++; $hash{$key} = $val; } <>; # 34 MB my $dump = Dumper \%hash; <>; # 110 MB` [download]	[reply] [d/l]
Re^3: Constructing a hash - why isn't my regex matching anything by ELISHEVA (Prior) on Dec 19, 2010 at 11:24 UTC
The hash alone takes up between 12 and 13 megs (125,000 * 100 chars per key-value pair), but 13 megs isn't a great deal of memory on most machines these days. What sort of machine are you on? Are you by any chance running this script on a server or virtual machine with some sort of artificial per-process memory cap? Another possibility: How do you construct this file that you are extracting keys and values from? Earlier you posted a question about recursive extraction of file names. Is this part of the same script? Perhaps earlier or later in your script (above or below this loop) you have some left over code that slurped in a very large file all at once? Or perhaps your recursion rather than this loop is eating up all of the memory?	[reply]
Re^4: Constructing a hash - why isn't my regex matching anything by Anonymous Monk on Dec 19, 2010 at 11:40 UTC
I think it takes more than that :) Read more... (2 kB) The numbers are in Kbytes and the memory usage doubles due to Data::Dumper, from 78MB to 142MB	[reply] [d/l]
Re^5: Constructing a hash - why isn't my regex matching anything by ELISHEVA (Prior) on Dec 19, 2010 at 12:11 UTC
Well, I'll be.... Any idea of where all that extra memory usage is coming from (beyond the 78M for Data::Dumper)? That's a lot of extra space for 13M of actual data. Based on a conversation in the CB, hash buckets only account for about half a meg extra, not 60M (or 20M as per another tester in a reply further up) Update:A quick check on my machine comes up with 26M for storing key value pairs in an array, and 34M for storing them in a hash: `key-value pair: 112 bytes total data for 125,000 key-value pairs: 13.25M virtual memory usage for array built via push @aData, $k, $v: 26M virtual memory usage for hash built via $hData{$k} = $v: 34M` [download] The test script is below Read more... (1047 Bytes)	[reply] [d/l] [select]
Re^6: Constructing a hash - why isn't my regex matching anything by Anonyrnous Monk (Hermit) on Dec 19, 2010 at 12:59 UTC
Re^6: Constructing a hash - why isn't my regex matching anything by Anonymous Monk on Dec 19, 2010 at 12:57 UTC