Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have this perl script whcih process a huge data (2 input files of size 17gb & 3 gb ). Script stores and fetches data from sqlite db and perl hashes. This script runs for more than a one hour. and during the process CPU utilisation remains 98 to 100%. is it bad? also it is the single process which runs at that time on my linux server. Please let me know if I need to take care of this or is it ok.| Please let me know in case you need further info on my script or any thing.. thanks

  • Comment on Is 100% CPU utilisation during a procees is aproblem?

Replies are listed 'Best First'.
Re: Is 100% CPU utilisation during a procees is aproblem?
by BrowserUk (Patriarch) on Nov 24, 2014 at 11:39 UTC

    The biggest file I had handy is 1GB, and it took 15 seconds to process using wc -l big.dat and uses ~5% cpu.

    Based on that, your 17GB should take ~4 1/2 mins and the small one ~3/4 min, and the cpu usage of 5%-10% is typical for an IO bound program.

    My guess is that you're using more memory than your machine has available and have therefore moved into thrashing.

    Whether this is a problem depends upon your urgency and expectations.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is 100% CPU utilisation during a procees is aproblem?
by marto (Cardinal) on Nov 24, 2014 at 11:12 UTC
Re: Is 100% CPU utilisation during a procees is aproblem?
by Anonymous Monk on Nov 24, 2014 at 11:15 UTC
    Is 100% CPU utilisation during a procees is aproblem?

    I don't know, is it? Is your server overheating? Is the process hindering your server from working properly?

    You haven't shown any code, so we don't know whether the run time or CPU utilization is normal for the tasks your script is performing or it could be optimized. You might want to try profiling your code with something like Devel::NYTProf to see if there are spots in the code worth optimizing.

    Lastly, try

    use less 'CPU';

    (just kidding)

      Hi All, thanks for reply. Below is the first part of my code where script is creating a sqlite database, manipulating the data of 3gb file and then storing it into the database..as I start running script CPU usage reaches 99-100%.. so this part has the problem..

      #!/usr/bin/perl use Data::Dumper; use DBI; use File::Basename; #use warnings; open(FH,"<",$ARGV[0]); open(AH,"<",$ARGV[1]); open(OUT,">",$ARGV[2]); open(LOG,">",$ARGV[3]); ###################################################################### +################################################# #Conneting to database my $dbExt = "db"; my $driver = "SQLite"; my $file = basename($ARGV[1]); my @fileName = split(/\./, $file); $file = $fileName[0].".$dbExt"; my $dbFileName = dirname($ARGV[1])."/". $file; if(-e $dbFileName) { unlink ($dbFileName); } my $database = "$dbFileName"; my $dsn = "DBI:$driver:dbname=$database"; my $userid = ""; my $password = ""; my $dbh = DBI->connect($dsn, $userid, $password, { RaiseError => 1, Au +toCommit => 0 }) or die $DBI::errstr; my $stmt = qq(CREATE TABLE ACCU_USAGE (MOBILE VARCHAR2(50), PLANNAME +VARCHAR2(50), PLANUSAGE CHAR(50), COUNTER CHAR(20), STATUS VARCHAR2(2 +0), EXPIRY_DATE CHAR( 20), PREEXP_TIME CHAR(20), PREEXP_VOLUME CHAR(20)); ); my $rv = $dbh->do($stmt); if($rv < 0){ unlink($dbFileName) if(-e $dbFileName); $dbh->disconnect(); print LOG $DBI::errstr; exit(1); } ###################################################################### +################################################# my $SubsSize= -s $ARGV[0]; my $AccuSize= -s $ARGV[1]; my $SubsCount=0; my $AccuCount=0; my $finalCount=0; #my $AccUsg; my $rowCount = 0; while(<AH>) { chomp; my $line=$_; $AccuCount++; my $MobileNumber; if( $line =~ /subscriberId:(\w+)\(\"(\d+)\"\)/ ) { $MobileNumber=$2; } else { $MobileNumber="''"; } my $plan=$line; $plan =~s/\\//g; my @AccVolume; if ( $plan =~ /usageControlAccum:(\w+)\(\"(.*)\"\)/ ) { my $p=$2; $p=~s/:\{/ => {/g; $p=~s/:\[/ => [/g; $p=~s/\"/\'/g; $p=~s/\':/\'=>/g; $p=~s/\}n/\}/g; #print $p,"\n"; my $e=eval($p); if ( @$ ) { push (@AccVolume,"error"); } else { #print Dumper($e); foreach my $value ( @{$e->{'reportingGroups'}} ) { if ( exists ( $value->{'absoluteAccumulated'}->{'counters' +} ) ) { $stmt = qq(INSERT INTO ACCU_USAGE VALUES ($MobileNumb +er, \'$value->{'subscriberGroupName'}\', $value->{'absoluteAccumulate +d'}->{'counters'}->[0]- >{'bidirVolume'}, \'$value->{'absoluteAccumulated'}->{'counters'}->[0] +->{'name'}\', \'$value->{'selected'}\', \'$value->{'absoluteAccumulat +ed'}->{'expiryDate'}->{ 'volume'}\', \'$value->{'absoluteAccumulated'}->{'previousExpiryDate'} +->{'time'}\', \'$value->{'absoluteAccumulated'}->{'previousExpiryDate +'}->{'volume'}\' );); } elsif ( exists ( $value->{'absoluteAccumulated'}->{'bidirV +olume'} ) ) { $stmt = qq(INSERT INTO ACCU_USAGE VALUES ($MobileNumb +er, \'$value->{'subscriberGroupName'}\', $value->{'absoluteAccumulate +d'}->{'bidirVolume'}, \ '$value->{'absoluteAccumulated'}->{'name'}\', \'$value->{'selected'}\' +, \'$value->{'absoluteAccumulated'}->{'expiryDate'}->{'volume'}\', \' +$value->{'absoluteAccum ulated'}->{'previousExpiryDate'}->{'time'}\', \'$value->{'absoluteAccu +mulated'}->{'previousExpiryDate'}->{'volume'}\' );); } ###################################################################### +############################################################ $rv = $dbh->do($stmt) or die $DBI::errstr; if($rv < 0) { print LOG "Failed to insert $stmt query. Exiting...\n" +; unlink($dbFileName) if(-e $dbFileName); print LOG $DBI::errstr; $dbh->disconnect(); exit(1); } else { $rowCount++; if($rowCount == 5000) { $dbh->commit(); $rowCount = 0; } } ###################################################################### +############################################################# } } } } close(AH); ###################################################################### +######################################3 $dbh->commit();

      here AH is my 3 gb file.. script manipulates it line by line and forms a hash which stores the values to be filled into the database...<\p>

        By far the greatest amount of time is being spent evaling the data structure into existence:

        my $e = eval( $p );

        There's not a lot you can do about that. Writing your own parser would be complicated, error prone and almost certainly eons slower.

        The second largest amount of time is being spent building and executing those two complicated SQL statements:

        $stmt = qq(INSERT INTO ACCU_USAGE VALUES ($Mobil ... ... $stmt = qq(INSERT INTO ACCU_USAGE VALUES ($MobileNumber, ... ... $rv = $dbh->do($stmt) or die $DBI::errstr; ...

        Now that you can do something about. By pre-preparing the statement using placeholders:

        my $sql = $dbh->prepare( q[ INSERT INTO ACCU_USAGE VALUES( ?, ?, ?, ? +, ?, ?, ?, ? ) ] ) or die $DBI::errstr;

        and then binding the vars when executing the statement:

        $rv = $sql->execute( $MobileNumber, $value->{'subscriberGroupName'}, $value->{'absoluteAccumulated'}->{'counters'}- +>[0]->{'bidirVolume'}, $value->{'absoluteAccumulated'}->{'counters'}- +>[0]->{'name'}, $value->{'selected'}, $value->{'absoluteAccumulated'}->{'expiryDate' +}->{'volume'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'time'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'volume'} ) or die $DBI::errstr;

        Not only will you give the DB engine the best chance of performing the inserts optimally; you can also make your code look a lot cleaner and simpler to maintain. It won't reduce your cpu usage; but it should result in your program finishing more quickly.

        I hope you'll agree this is much nicer to look at and understand:

        #!/usr/bin/perl use Data::Dumper; use DBI; use File::Basename; #use warnings; @ARGV = qw[ NUL junk.dat CON CON ]; ## crude hack for local testing. open( FH, "<", $ARGV[0] ); open( AH, "<", $ARGV[1] ); open( OUT, ">", $ARGV[2] ); open( LOG, ">", $ARGV[3] ); #Conneting to database my $dbExt = "db"; my $driver = "SQLite"; #my $file = basename($ARGV[1]); #my @fileName = split(/\./, $file); #$file = $fileName[0] . ".$dbExt"; my $dbFileName = 'junk.db'; #dirname( $ARGV[1] ) ."/" . $file; if(-e $dbFileName) { unlink ($dbFileName); } my $database = "$dbFileName"; my $dsn = "DBI:$driver:dbname=$database"; my $userid = ""; my $password = ""; my $dbh = DBI->connect($dsn, $userid, $password, { RaiseError => 1, Au +toCommit => 0 }) or die $DBI::errstr; my $stmt = <<EOS; CREATE TABLE ACCU_USAGE ( MOBILE VARCHAR2(50), PLANNAME VARCHAR2(50), PLANUSAGE CHAR(50), COUNTER CHAR(20), STATUS VARCHAR2(20), EXPIRY_DATE CHAR(20), PREEXP_TIME CHAR(20), PREEXP_VOLUME CHAR(20) ); EOS my $rv = $dbh->do($stmt); if($rv < 0){ unlink($dbFileName) if(-e $dbFileName); $dbh->disconnect(); print LOG $DBI::errstr; exit(1); } my $sql = $dbh->prepare( q[ INSERT INTO ACCU_USAGE VALUES( ?, ?, ?, ? +, ?, ?, ?, ? ) ] ) or die $DBI::errstr; my $SubsSize= -s $ARGV[0]; my $AccuSize= -s $ARGV[1]; my $SubsCount=0; my $AccuCount=0; my $finalCount=0; my $rowCount = 0; while( <AH> ) { chomp; my $line = $_; $AccuCount++; my $MobileNumber; if( $line =~ /subscriberId:(\w+)\(\"(\d+)\"\)/ ) { $MobileNumber = $2; } else { $MobileNumber = "''"; } my $plan = $line; $plan =~ s/\\//g; my @AccVolume; if( $plan =~ /usageControlAccum:(\w+)\(\"(.*)\"\)/ ) { my $p = $2; $p =~ s/:\{/ => {/g; $p =~ s/:\[/ => [/g; $p =~s/\"/\'/g; $p =~ s/\':/\'=>/g; $p =~ s/\}n/\}/g; # print $p,"\n"; my $e = eval( $p ); if ( @$ ) { push (@AccVolume,"error"); } else { #print Dumper($e); foreach my $value ( @{$e->{'reportingGroups'}} ) { if ( exists ( $value->{'absoluteAccumulated'}->{'count +ers'} ) ) { $rv = $sql->execute( $MobileNumber, $value->{'subscriberGroupName'}, $value->{'absoluteAccumulated'}->{'counters'}- +>[0]->{'bidirVolume'}, $value->{'absoluteAccumulated'}->{'counters'}- +>[0]->{'name'}, $value->{'selected'}, $value->{'absoluteAccumulated'}->{'expiryDate' +}->{'volume'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'time'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'volume'} ) or die $DBI::errstr; } elsif ( exists ( $value->{'absoluteAccumulated'}->{'bi +dirVolume'} ) ) { $rv = $sql->execute( $MobileNumber, $value->{'subscriberGroupName'}, $value->{'absoluteAccumulated'}->{'bidirVolume +'}, $value->{'absoluteAccumulated'}->{'name'}, $value->{'selected'}, $value->{'absoluteAccumulated'}->{'expiryDate' +}->{'volume'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'time'}, $value->{'absoluteAccumulated'}->{'previousExp +iryDate'}->{'volume'} ) or die $DBI::errstr; } if($rv < 0) { print LOG "Failed to insert $stmt query. Exiting.. +.\n"; unlink($dbFileName) if(-e $dbFileName); print LOG $DBI::errstr; $dbh->disconnect(); exit(1); } else { $rowCount++; if($rowCount == 5000) { $dbh->commit(); $rowCount = 0; } } } } } } close(AH); $dbh->commit();

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Can you post a small sample of the file you are processing please ?
A reply falls below the community's threshold of quality. You may see it by logging in.