Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl script which uses 2 inputs file of size 16gb & 800mb and creates an output file. It creates 4 hashes to store data from 800mb file and then compare and retrieve data from the other 16gb file and finally creates a report Now its working pretty fine but it is getting stuck at random creating the final report of around 500mb. No Idea why it is getting stuck. It doesn't seem a script bug as the 500mb data of report is generating perfectly. Please help me..not pasting the code here as it is too big to put...please let me know if you need further info. Thanks

:) sorry for the trouble..I am pasting my code here...please help.

#!/usr/bin/perl use Data::Dumper; my $lookup; $lookup; ##Maharashtra to MH & Mumbai to MUM open(FH,"<",$ARGV[0]); open(AH,"<",$ARGV[1]); open(OUT,">",$ARGV[2]); open(LOG,">",$ARGV[3]); my $SubsSize= -s $ARGV[0]; my $AccuSize= -s $ARGV[1]; my $SubsCount=0; my $AccuCount=0; my $finalCount=0; my $AccUsg; my $AccUsg1; my $AccUsg2; my $AccUsg3; while(<AH>) { chomp; my $line=$_; $AccuCount++; my $MobileNumber; if( $line =~ /subscriberId:(\w+)\(\"(\d+)\"\)/ ) { $MobileNumber=$2; } my $plan=$line; $plan =~s/\\//g; my @AccVolume; if ( $plan =~ /usageControlAccum:(\w+)\(\"(.*)\"\)/ ) { my $p=$2; $p=~s/:\{/ => {/g; $p=~s/:\[/ => [/g; $p=~s/\"/\'/g; $p=~s/\':/\'=>/g; $p=~s/\}n/\}/g; #print $p,"\n"; my $e=eval($p); if ( @$ ) { push (@AccVolume,"error"); } else { foreach my $value ( @{$e->{'reportingGroups'}} ) { if ( exists ( $value->{'absoluteAccumulated'}->{'counters' +} ) ) { $AccUsg->{$MobileNumber}->{$value->{'subscriberGroupName +'}}=$value->{'absoluteAccumulated'}->{'counters'}->[0]->{'bidirVolume +'}; $AccUsg1->{$MobileNumber}->{$value->{'subscriberGroupNam +e'}}=$value->{'absoluteAccumulated'}->{'counters'}->[0]->{'name'}; $AccUsg2->{$MobileNumber}->{$value->{'subscriberGroupName +'}}=$value->{'absoluteAccumulated'}->{'expiryDate'}->{'volume'}; $AccUsg3->{$MobileNumber}->{$value->{'subscriberGroupNam +e'}}->{'time'}=$value->{'absoluteAccumulated'}->{'previousExpiryDate' +}->{'time'}; $AccUsg3->{$MobileNumber}->{$value->{'subscriberGroupNam +e'}}->{'volume'}=$value->{'absoluteAccumulated'}->{'previousExpiryDat +e'}->{'volume'}; } elsif ( exists ( $value->{'absoluteAccumulated'}->{ +'bidirVolume'} ) ) { $AccUsg->{$MobileNumber}->{$value->{'subscriberGroupName +'}}=$value->{'absoluteAccumulated'}->{'bidirVolume'}; } } } } } close(AH); print OUT "MSISDN,IMEI,Circle,DeviceType,OPTIN,PlanType,counterName,ex +pdate,prevTime,prevVol,PACKID1;priority;startdate;enddate;AccumulateU +sage|PACKID2;priority;startdate;enddate;AccumulateUsage|PACKID3;prior +ity;startdate;enddate;AccumulateUsage|PACKID4;priority;startdate;endd +ate;AccumulateUsage|\n"; while(<FH>) { chomp; my $line=$_; $SubsCount++; $msisdn; my $IMEI; my $Circle; my $DeviceType; my $OPTIN; my $PlanType; my $familyId; my $trafficIds; if($line=~/userId:S(\d+)\(\"(\w+)\"\)/) { $msisdn = $2; } if ( $line=~/operatorInfo:A(\d+)\[(.*?)\]/ ) { my $opcinfo=$2; if( $opcinfo =~ /ix(\d+):S(\d+)\(\"imei:(\w*)\"\)/ ) { $IMEI=$3; } if( $opcinfo =~ /ix(\d+):S(\d+)\(\"OptInState:(\w*)\"\)/ ) { $OPTIN=$3; } if( $opcinfo =~ /ix(\d+):S(\d+)\(\"CircleId:(\w*)\"\)/ ) { $Circle=$3; if( exists ( $lookup->{$Circle} ) ) { $Circle=$lookup->{$Circle}; } } if( $opcinfo =~ /ix(\d+):S(\d+)\(\"DevType:(\w*)\"\)/ ) { $DeviceType=$3; } if( $opcinfo =~ /ix(\d+):S(\d+)\(\"PlanType:(\w*)\"\)/ ) { $PlanType=$3; } } my @ValidPlan; if($line=~/groups:A(\d+)\[(.*?)\]/) { my $plans=$2; my @AllPlans = split('ix\d+:S\d+\("',$plans); foreach my $p ( @AllPlans ) { $p =~ s/\"\)//g; if ($p eq "" ) { next; } if ( $p =~ /(\w+):(\d+)[:]?(.*)/) { $planname=$1; my $priority=$2; my $expdate=$3; $expdate =~ s/,/\;/g; if( exists ( $AccUsg->{$msisdn}->{$planname} ) ) { if ( $expdate eq "") { push(@ValidPlan,"$planname;$priority;;;$AccUsg->{$m +sisdn}->{$planname}"); } elsif ( length($expdate) > 19 ) { push(@ValidPlan,"$planname;$priority;$expdate;$AccU +sg->{$msisdn}->{$planname}"); } else { push(@ValidPlan,"$planname;$priority;$expdate;;$Ac +cUsg->{$msisdn}->{$planname}"); } } else { if( $expdate eq "") { push(@ValidPlan,"$planname;$priority;;;"); } else { if( length($expdate) > 19 ) { push(@ValidPlan,"$planname;$priority;$ex +pdate;"); } else { push(@ValidPlan,"$planname;$priority +;$expdate;;"); } } } } } } if ( $line=~/familyId:S(\d+)\(\"(.*?)\"\)/ ) { $familyId=$2; } if ( $line=~/trafficIds:A(\d+)\[(.*?)\]/ ) { $trafficIds=$2; } my @y = sort { ($b =~ /(.*?);(\d+);(.*?)/)[1] <=> ($a =~ /(.*?);(\d+); +(.*?)/)[1] } @ValidPlan; #ankur my $printPlan=join("|",@y); $finalCount++; print OUT "$msisdn,$IMEI,$Circle,$DeviceType,$OPTIN,$PlanType,$AccUs +g1->{$msisdn}->{$planname},$AccUsg2->{$msisdn}->{$planname},$AccUsg3- +>{$msisdn}->{$planname}->{'time'},$AccUsg3->{$msisdn}->{$planname}->{ +'volume'},$printPlan\n"; close(FH); close(OUT);

Replies are listed 'Best First'.
Re: Perl script is getting stuck for no reason
by marto (Cardinal) on Jul 31, 2014 at 10:51 UTC
Re: Perl script is getting stuck for no reason
by davido (Cardinal) on Jul 31, 2014 at 17:17 UTC

    "Perl script is getting stuck for no reason"

    There's always a reason. An apparent lack of reason is an indication of inadequate investigation.

    "...it is getting stuck at random..."

    That is almost impossible; computers are useful because they are deterministic. Of course you're probably aware that computers don't behave randomly, or you wouldn't be asking for assistance with the behavior you're seeing. But the sooner you abolish all notion that something is happening randomly, the sooner you'll be on your way to diagnosing the problem yourself.

    The code you posted doesn't compile. But you're not asking about why it doesn't compile, you're asking why your code seems to hang. So that tells me that the code you provided for us to look at is not exactly the code you're running. That makes it difficult for us to know what's wrong.

    Fixing the right curly bracket, I then get three warnings. Have you investigated why you're getting warnings? Are you getting warnings, or is that something that only happens in the code you provided us?

    We can't debug this for you; we don't have the data, or the real code. But let me offer these suggestions, which with a little diligence, will probably give you an answer. For what it's worth, these are steps I would probably take.

    • Add "warn" statements that tell you where you are within your code, and where you are within loops.
    • Run this with "top" going in another terminal. Watch for memory growth. The "warn" statements previously mentioned will help you to see what parts of your script are consuming memory, as you follow the runtime with top.
    • Test with a much smaller subset of your data sets. This will allow for quicker iterations of the debug/test cycle.
    • Even with your much smaller subset, don't be satisfied with any substantial growth in memory usage as the script progresses. Keep in mind that you are dealing with very large files. If it becomes too costly to hold intermediate data in memory, put intermediate results into files or a database, and process those files rather than trying to hold the entire computation in memory at once.

    If it seems like I'm talking a lot about memory, it's because without a running script and small test data set, I can only go with instinct, and my instinct is that when I see some data structures persisting throughout your script, and read that you're dealing with 800MB, 16GB, and 500MB files, and when I read that your complaint is the script grinding to a halt, you are consuming too much memory and bringing your system to its knees.


    Dave

Re: Perl script is getting stuck for no reason
by AppleFritter (Vicar) on Jul 31, 2014 at 11:12 UTC

    So, to sum it up, code that we don't know is operating on data we haven't seen and failing in an unspecified way. I'm sure I'm not the only monk shrugging helplessly now, thinking "how exactly do you think we're supposed to help you?"

    Please give us something to work with. If your code's too large to post, whittle it down to the a simple test case that exhibits the problematic behavior. If your input files are too large to post (or confidential), provide some simpler ones that show the problem.

    Attempting to do this may be instructive in itself: perhaps while simplifying, you'll realize what the problem is. Or perhaps you'll cross a threshold below which it'll work, which may provide clues as to what's going wrong (and why, and how to fix it). Even if you can successfully create a small, postable test case, perhaps it'll be simple enough that you'll see the issue yourself. If not, at least the monks will have something to work with.

    Some general tips and links for asking questions, copied from my homenode:

Re: Perl script is getting stuck for no reason
by Discipulus (Canon) on Jul 31, 2014 at 10:57 UTC
    How (Not) To Ask A Question

    We refer to many monks here around as 'wizards' but they are not truely wizards: what do you expect? 'add $|++; after line 2048'?
    sorry if it sounds polemic...

    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Perl script is getting stuck for no reason (swap)
by tye (Sage) on Jul 31, 2014 at 16:58 UTC

    I suspect you mean "hang" in the sense of "doesn't seem to be producing any output even after a long time" as opposed to a more precise use of "hang" for "stops consuming CPU but doesn't exit".

    The most likely explanation is that the performance of your script is not linear so the much bigger file just takes significantly longer to process. The most likely explanation for seriously non-linear performance given some of what you said is that your process ends up using more memory than your computer can efficiently provide it.

    But such possibilities are easy for you to investigate, requiring, at most, some internet searching and minor research to learn a couple of commands and how to interpret part of their output (such as top and ps).

    Of course, maybe you have already investigated those possibilities and eliminated them. But I can't tell that based on what little information you provided.

    Identifying whether or not the problem is just "using too much memory" is a good step to do before diving into the search for a more specific source of the problem. Hence, I didn't dive into the code you have now added to your posting.

    Good luck.

    BTW, further updates to your question are better done as replies rather than as modifying the original question. The site does not provide a way for people to notice that you've made an update (while there are several heavily used way to notice new replies). Also, updates tend to make the thread confusing.

    - tye        

Re: Perl script is getting stuck for no reason
by zentara (Cardinal) on Jul 31, 2014 at 11:59 UTC
    Since you are new, I'll give you a place to start. Run your script, with the 16 gig input file thru strace, and see where it hangs. Google for debugging with strace if you need to see how to use it.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Perl script is getting stuck for no reason
by salva (Canon) on Jul 31, 2014 at 11:02 UTC
    try...
    use Acme::Tasseography; my $at = Acme::Tasseography->new(<<ISSUE); oooh... 2 inputs files oooh... of size 16gb & 800mb oooh... 4 hashes ooohh... oooooohhhh... around 500mb oooooooooohhhh... it stucks ISSUE say $at->tell_fortune;
    On my computer it says
    Look at line 100...
    Though, running it a second time it says
    hire a consultant...
Post the data
by dolmen (Beadle) on Jul 31, 2014 at 12:18 UTC
    Please post the data too as it looks like to be very interesting.