Re: Alternate for "open"

Hey there, thanks for posting that. You still haven't posted the relevant code portion, so we're still guessing here.

(If your code is "too complicated" for you to be able to copy and paste 10 or 20 lines that contain the functionality you suspect for the memory hogging, then that's an indication that you should simplify the structure of the code).

I'm not a sysadmin but it seems to me that if your 64GB RAM server is running at 75% memory usage before you start, it's working rather hard. (On the other hand 25% of 64Gb is still a lot and you should be able to process any number of text files if you code it right.) Currently my busiest server which runs three different daemons forking processes, scraping websites, loading and processing data, and pushing the data to external apis, is only using about 6GB RAM.

So looking at the rest of the data you posted, it seems more likely than ever that you need a database for your data. Basically your hash is a database, but it's not up to the task.

If you have 3800 files, and if you need all the data from all the files to accomplish your task, and if reading and temporarily processing a file can add 4Mb RAM usage, you really need to use a database.

You probably already have a RDBMS on the server, but I'd start with SQLite anyway for its light memory footprint and ease of use.

It's very unlikely that you need to hold all the data in memory to do your work, and there are many good reasons why you shouldn't. For example, when you create a hash (or any data structure) in Perl, that memory is not released until the variable goes out of scope. So if you are creating a global hash to store the file data, and then reading from the hash once (for example, to add to an aggregate hash), and then not using the hash again, you're not getting that memory back before the program finishes. This can cause you to run out of memory quickly. So if your program looks anything like:

#!/usr/bin/perl
use strict;
use warnings;
my %h1;
my %h2; #etc

# a bunch of code here to setup and prepare,
# maybe find the list of files 

foreach my $file ( @files ) {
    # open ...
    # split ...
    $h1{ $file } = $splitted[1];
    $h2{ $file } = $splitted[2];
    # ... and so on
    
    my $res1 = my_func( %h1 ); # don't do this
    my $res2 = my_func( %h2 );
    # ...

    if ( some condition ) {
        # do something with $res1 and $res2
    }
    # continue, 
    # lots of code
    # working hard
    # but never using the hashes again
}
[download]

... then you are allocating memory to the hashes before you need to (though they will be empty) and keeping the memory allocated long after the usefulness of the hash has expired.

So declare and use your variables in the smallest possible scope.

Also don't pass around actual data structures (as I do in the above example, to show bad practise), because then Perl has to make another copy of the data. Pass references to them, ie use:

my $res = my_func( \%hash );
# not: my_func( %hash );
[download]

To take that even further, consider off-loading the work of processing the files to another program, so that all system resources are released when the processing is done.

If the above tips seem random and maybe unrelated, it's because you haven't shared the code, so I'm just throwing out random and maybe unrelated tips.

The way forward always starts with a minimal test.

Comment on Re: Alternate for "open" Select or Download Code

Replies are listed 'Best First'.
Re^2: Alternate for "open" by ravi45722 (Pilgrim) on Nov 18, 2015 at 04:24 UTC
Sorry, I tested it on the local server. I cant made all my experiments in the main server. So, forget about the starting memory usage & think about the hash size its growing. As you told i am posting my peice of code which i am using for reading the files. 1) I am not passing any hashes to functions. 2) I cant release the scope of that hash because I need it to write an excel at the end of the code foreach $file (@cdr_list) { chomp $file; my $memgthy = `free -m \| awk 'NR==2{printf " %.2f%",\$3*100/\$2 }' +`; print "Memory Usage : ",$memgthy,$/; open (FP,"$file") or die "Could not open $file\n"; $first=1; while ($line=<FP>){ chomp $line; if ($first==1){ ($sgsn_id,$x,$time,$x)=split(/\,/,$line); push(@sgsn_list,$sgsn_id) unless $seen_sgs{$sgsn_id}++; $cdr_date=substr($time,0,10); $single_day=substr($time,0,8); push (@date_list,$cdr_date) unless $seen_cdr{$cdr_date}++; push (@single_day_list,$single_day) unless $seen{$single_d +ay}++; $first++; } else{ #FAIL_IN_RAU_MME_TO_2GSGSN,0; $cmp=substr(reverse($line),0,1); if ($cmp eq ";") { ($var,$value,$x)=split(/\,/,$line); push (@variable_list,$var) unless $seen_var{$var}++; $ass_val{$var}=$value; $pap_data{$cdr_date}{$sgsn_id}{$pap_id}=0; if ($mcc ne "" && length($mcc)<4){ $mcc_data{$cdr_date}{$sgsn_id}{$mcc}=0; } if ($mnc ne ""){ $mnc_data{$cdr_date}{$sgsn_id}{$mcc}{$mnc}=0; } if ($rac ne ""){ $rac_data{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac}=0 +; } if ($nsei ne ""){ $nsei_data{$cdr_date}{$periodic_duration}{$sgsn_id +}{$pap_id}{$nsei}=0; } if ($nsvci ne ""){ $nsvci_data{$cdr_date}{$periodic_duration}{$sgsn_i +d}{$pap_id}{$nsei}{$nsvci}=0; } foreach $var (@variable_list){ $value=$ass_val{$var}; if ($var eq "RTT_DUR_ATTACH_MIN" \|\| $var eq "RTT_D +UR_ATTACH_MAX" \|\| $var eq "PEAK_GB_PDP_Cont" \|\| $var eq "PEAK_ATTACH_ +GB_USERS" \|\| $var eq "PEAK_ACTIVE_SUBS_PER_PAPU" \|\| $var eq "PEAK_ACT +IVE_GB_PDP_CONTEXTS" \|\| $var eq "DUR_MO_PDP_MOD_MIN" \|\| $var eq "DUR_ +MO_PDP_MOD_MAX" \|\| $var eq "PEAK_ATTACH_IU_USERS" \|\| $var eq "PEAK_AC +TIVE_IU_PDP_CONTEXTS" \|\| $var eq "PEAK_IU_PDP_CONT" \|\| $var eq "PEAK_ +LOAD_RATE_OF_OBJECT" \|\| $var eq "PEAK_GB_PDP_CONT"){ $sgsn_name{$single_day}{$sgsn_id}{$var}{$value +}=0; $sgsn_val{$cdr_date}{$sgsn_id}{$var}{$value}=0 +; $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var}{$ +value}=0; $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var +}{$value}=0; $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac +}{$var}{$value}=0; } else { if ($mcc == 405){ $circle_val{$single_day}{$mnc}{$var}=$valu +e + $circle_val{$single_day}{$mnc}{$var}; } $sgsn_name{$single_day}{$sgsn_id}{$var}=$value + + $sgsn_name{$single_day}{$sgsn_id}{$var}; $sgsn_val{$cdr_date}{$sgsn_id}{$var}=$value + +$sgsn_val{$cdr_date}{$sgsn_id}{$var}; $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var}=$ +value + $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var}; $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var +}=$value + $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var}; $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac +}{$var}=$value + $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac}{$var +}; if ($var eq "IP_NSVC_PASSED_DATA_IN_BYTES"){ $nsvc_val{$cdr_date}{$periodic_duration}{$ +sgsn_id}{$pap_id}{$nsei}{$nsvci}{$var}=$value + $nsvc_val{$cdr_date}{ +$periodic_duration}{$sgsn_id}{$pap_id}{$nsei}{$nsvci}{$var}; } } } @variable_list=(); %seen_var=(); %ass_val=(); $pap_id=""; $mcc=""; $mnc=""; $rac=""; $nsei=""; $nsvci=""; } else { ($var,$value,$x)=split(/\,/,$line); if ($var eq "PAPU_INDEX"){ $pap_id=$value; }elsif ($var eq "MCC" \|\| $var eq "IU_RA_MCC"){ $mcc=$value; }elsif ($var eq "MNC" \|\| $var eq "IU_RA_MNC"){ $mnc=$value; }elsif ($var eq "RAC"){ $rac=$value; }elsif ($var eq "NSEI"){ $nsei=$value; }elsif ($var eq "NSVCI"){ $nsvci=$value; }elsif ($var eq "PERIOD_DURATION"){ $periodic_duration=$value; push (@periodic_list,$periodic_duration) unless $s +een_dur{$periodic_duration}++; }elsif ($var eq "OBJECT_NAME"){ $object_name=$value; $object_data{$cdr_date}{$sgsn_id}{$object_name}=0; }elsif ($var eq "OBJECT_INDEX"){ $object_index=$value; $objectindex_data{$cdr_date}{$sgsn_id}{$object_nam +e}{$object_index}=0; }elsif ($var eq "PEAK_LOAD_RATE_OF_OBJECT"){ $peak_load_data{$cdr_date}{$sgsn_id}{$object_name} +{$object_index}{$var}{$value}=0; }elsif ($var eq "AVE_LOAD_RATE_SUM" \|\| $var eq "AVE_LO +AD_RATE_DEN"){ $peak_load_data{$cdr_date}{$sgsn_id}{$object_name} +{$object_index}{$var}= $peak_load_data{$cdr_date}{$sgsn_id}{$object_n +ame}{$object_index}{$var} + $value; } else { push (@variable_list,$var) unless $seen_var{$var}+ ++; $ass_val{$var}=$value; } } } } close (FP); FP->flush(); } my $total_size = total_size(\%val); print "\%val :",$total_size,$/; my $total_size = total_size(\%seen_cdr); print "\%seen_cdr :",$total_size,$/; my $total_size = total_size(\%seen_var); print "\%seen_var :",$total_size,$/; my $total_size = total_size(\%pap_data); print "\%pap_data :",$total_size,$/; my $total_size = total_size(\%mcc_data); print "%mcc_data :",$total_size,$/; my $total_size = total_size(\%mnc_data); print "\%mnc_data :",$total_size,$/; my $total_size = total_size(\%sgsn_val); print "\%sgsn_val :",$total_size,$/; my $total_size = total_size(\%sgsn_name); print "\%sgsn_name :",$total_size,$/; my $total_size = total_size(\%pap_val); print "\%pap_val :",$total_size,$/; my $total_size = total_size(\%mcc_val); print "\%mcc_val :",$total_size,$/; my $total_size = total_size(\%rac_val); print "\%rac_val :",$total_size,$/; my $total_size = total_size(\%nsei_data); print "\%nsei_data :",$total_size,$/; my $total_size = total_size(\%nsvci_data); print "\%nsvci_data :",$total_size,$/; my $total_size = total_size(\%nsvc_val); print "\%nsvc_val :",$total_size,$/; my $t1 = Benchmark->new; my $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Alternate for "open"
by ravi45722 (Pilgrim) on Nov 18, 2015 at 04:24 UTC

Sorry, I tested it on the local server. I cant made all my experiments in the main server. So, forget about the starting memory usage & think about the hash size its growing. As you told i am posting my peice of code which i am using for reading the files.

1) I am not passing any hashes to functions. 2) I cant release the scope of that hash because I need it to write an excel at the end of the code

 foreach $file (@cdr_list)
{
    chomp $file;
    my $memgthy = `free -m | awk 'NR==2{printf " %.2f%",\$3*100/\$2 }'
+`;
    print "Memory Usage : ",$memgthy,$/;
    open (FP,"$file") or die "Could not open $file\n";
    $first=1;
    while ($line=<FP>){
        chomp $line;
        if ($first==1){
            ($sgsn_id,$x,$time,$x)=split(/\,/,$line);
            push(@sgsn_list,$sgsn_id) unless $seen_sgs{$sgsn_id}++;
            $cdr_date=substr($time,0,10);
            $single_day=substr($time,0,8);
            push (@date_list,$cdr_date) unless $seen_cdr{$cdr_date}++;
            push (@single_day_list,$single_day) unless $seen{$single_d
+ay}++;
            $first++;
        }
        else{
            #FAIL_IN_RAU_MME_TO_2GSGSN,0;
            $cmp=substr(reverse($line),0,1);
            if ($cmp eq ";") 
            {
                ($var,$value,$x)=split(/\,/,$line);
                push (@variable_list,$var) unless $seen_var{$var}++;
                $ass_val{$var}=$value;
                $pap_data{$cdr_date}{$sgsn_id}{$pap_id}=0;
                if ($mcc ne "" && length($mcc)<4){
                    $mcc_data{$cdr_date}{$sgsn_id}{$mcc}=0;
                }
                if ($mnc ne ""){
                    $mnc_data{$cdr_date}{$sgsn_id}{$mcc}{$mnc}=0;
                }
                if ($rac ne ""){
                    $rac_data{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac}=0
+;
                }
                if ($nsei ne ""){
                    $nsei_data{$cdr_date}{$periodic_duration}{$sgsn_id
+}{$pap_id}{$nsei}=0;
                }
                if ($nsvci ne ""){
                    $nsvci_data{$cdr_date}{$periodic_duration}{$sgsn_i
+d}{$pap_id}{$nsei}{$nsvci}=0;
                }
                foreach $var (@variable_list){    
                    $value=$ass_val{$var};
                    if ($var eq "RTT_DUR_ATTACH_MIN" || $var eq "RTT_D
+UR_ATTACH_MAX" || $var eq "PEAK_GB_PDP_Cont" || $var eq "PEAK_ATTACH_
+GB_USERS" || $var eq "PEAK_ACTIVE_SUBS_PER_PAPU" || $var eq "PEAK_ACT
+IVE_GB_PDP_CONTEXTS" || $var eq "DUR_MO_PDP_MOD_MIN" || $var eq "DUR_
+MO_PDP_MOD_MAX" || $var eq "PEAK_ATTACH_IU_USERS" || $var eq "PEAK_AC
+TIVE_IU_PDP_CONTEXTS" || $var eq "PEAK_IU_PDP_CONT" || $var eq "PEAK_
+LOAD_RATE_OF_OBJECT" || $var eq "PEAK_GB_PDP_CONT"){
                        $sgsn_name{$single_day}{$sgsn_id}{$var}{$value
+}=0;
                        $sgsn_val{$cdr_date}{$sgsn_id}{$var}{$value}=0
+;
                        $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var}{$
+value}=0;
                        $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var
+}{$value}=0;
                        $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac
+}{$var}{$value}=0;
                    }
                    else {
                        if ($mcc == 405){
                            $circle_val{$single_day}{$mnc}{$var}=$valu
+e + $circle_val{$single_day}{$mnc}{$var};
                        }
                        $sgsn_name{$single_day}{$sgsn_id}{$var}=$value
+ + $sgsn_name{$single_day}{$sgsn_id}{$var};
                        $sgsn_val{$cdr_date}{$sgsn_id}{$var}=$value + 
+$sgsn_val{$cdr_date}{$sgsn_id}{$var};
                        $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var}=$
+value + $pap_val{$cdr_date}{$sgsn_id}{$pap_id}{$var};
                        $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var
+}=$value + $mcc_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$var};
                        $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac
+}{$var}=$value + $rac_val{$cdr_date}{$sgsn_id}{$mcc}{$mnc}{$rac}{$var
+};
                        if ($var eq "IP_NSVC_PASSED_DATA_IN_BYTES"){
                            $nsvc_val{$cdr_date}{$periodic_duration}{$
+sgsn_id}{$pap_id}{$nsei}{$nsvci}{$var}=$value + $nsvc_val{$cdr_date}{
+$periodic_duration}{$sgsn_id}{$pap_id}{$nsei}{$nsvci}{$var};
                        }
                    }
                }
                @variable_list=();
                %seen_var=();
                %ass_val=();
                $pap_id="";
                $mcc="";
                $mnc="";
                $rac="";
                $nsei="";
                $nsvci="";
            }
            else {
                ($var,$value,$x)=split(/\,/,$line);
                
                if ($var eq "PAPU_INDEX"){
                    $pap_id=$value;
                }elsif ($var eq "MCC" || $var eq "IU_RA_MCC"){
                    $mcc=$value;
                }elsif ($var eq "MNC" || $var eq "IU_RA_MNC"){
                    $mnc=$value;
                }elsif ($var eq "RAC"){
                    $rac=$value;
                }elsif ($var eq "NSEI"){
                    $nsei=$value;
                }elsif ($var eq "NSVCI"){
                    $nsvci=$value;
                }elsif ($var eq "PERIOD_DURATION"){
                    $periodic_duration=$value;
                    push (@periodic_list,$periodic_duration) unless $s
+een_dur{$periodic_duration}++;
                }elsif ($var eq "OBJECT_NAME"){
                    $object_name=$value;
                    $object_data{$cdr_date}{$sgsn_id}{$object_name}=0;
                }elsif ($var eq "OBJECT_INDEX"){
                    $object_index=$value;
                    $objectindex_data{$cdr_date}{$sgsn_id}{$object_nam
+e}{$object_index}=0;
                }elsif ($var eq "PEAK_LOAD_RATE_OF_OBJECT"){
                    $peak_load_data{$cdr_date}{$sgsn_id}{$object_name}
+{$object_index}{$var}{$value}=0;
                }elsif ($var eq "AVE_LOAD_RATE_SUM" || $var eq "AVE_LO
+AD_RATE_DEN"){
                    $peak_load_data{$cdr_date}{$sgsn_id}{$object_name}
+{$object_index}{$var}= $peak_load_data{$cdr_date}{$sgsn_id}{$object_n
+ame}{$object_index}{$var} + $value;
                } else {
                    push (@variable_list,$var) unless $seen_var{$var}+
++;
                    $ass_val{$var}=$value;
                }
            }
        }
    }
    close (FP);
    FP->flush();
}

my $total_size = total_size(\%val);
print "\%val :",$total_size,$/;
my $total_size = total_size(\%seen_cdr);
print "\%seen_cdr :",$total_size,$/;
my $total_size = total_size(\%seen_var);
print "\%seen_var :",$total_size,$/;
my $total_size = total_size(\%pap_data);
print "\%pap_data :",$total_size,$/;
my $total_size = total_size(\%mcc_data);
print "%mcc_data :",$total_size,$/;
my $total_size = total_size(\%mnc_data);
print "\%mnc_data :",$total_size,$/;
my $total_size = total_size(\%sgsn_val);
print "\%sgsn_val :",$total_size,$/;
my $total_size = total_size(\%sgsn_name);
print "\%sgsn_name :",$total_size,$/;
my $total_size = total_size(\%pap_val);
print "\%pap_val :",$total_size,$/;
my $total_size = total_size(\%mcc_val);
print "\%mcc_val :",$total_size,$/;
my $total_size = total_size(\%rac_val);
print "\%rac_val :",$total_size,$/;
my $total_size = total_size(\%nsei_data);
print "\%nsei_data :",$total_size,$/;
my $total_size = total_size(\%nsvci_data);
print "\%nsvci_data :",$total_size,$/;
my $total_size = total_size(\%nsvc_val);
print "\%nsvc_val :",$total_size,$/;


my $t1 = Benchmark->new;
my $td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
[download]

[reply]
[d/l]