in reply to Re^3: <speed a major issue with below code also loosing data while writing in files>
in thread <speed a major issue with below code also loosing data while writing in files>

here u will get the idea wht i am trying to do file name: abc_def_hij.csv data on any line comma seperated as: aaa,abc,aaa,def,aaa,hij…….. etc data on another line comma seperated: xxx,abc,xxx,def,xxx,hij..... etc. now another source file may contain data as coma seperated as : bbb,abc,bbb,def,bbb,hij .... etc so the data from another file should got appended in abc_def_hij.csv file. i will be g8t full if u solve this another thing if i exclude strict & warning every thing is working fine but it is taking app 5min to read 1 million records to create such files from it
  • Comment on Re^4: <speed a major issue with below code also loosing data while writing in files>

Replies are listed 'Best First'.
Re^5: <speed a major issue with below code also loosing data while writing in files>
by bluescreen (Friar) on Jul 26, 2011 at 18:37 UTC

    Can you post your new code.

    Also you said it takes 5 minutes to process 1M records, so what is your expected mark?

      Dear Monk, belive me The code is same there is no single line change. what evere i posted was that only. magically it was running fine without strict & warning. speed was a major concern. it should finish this within 1 min expected as my C code is also taking app 2 min to process that big file.pls help me in this. below is the code use POSIX; $num_args = $#ARGV + 1; if ($num_args != 1) { print "\nUsage: Spool.pl Require two argumrnts \n"; exit; } my @SMSFileList = `ls | grep CSV`; chop(@SMSFileList); my $FileCount = 0; my @lines; my ($lines,$CDR,$CDR1,$uniquekey,$uniquekey1,$file_name,$file_name1,$CircleGroupHandle,$CircleGroupHandle1,$uniq1,$uniq2,$fh,$fh1); my $targetdir=$ARGV[0]; foreach my $SMSFileName (@SMSFileList) { $FileCount++; sysopen (SOURCE_SMS_FILE,"$SMSFileName",O_RDONLY) or die "Error opening $SMSFileName"; my %myHash; my %myHash1; while(<SOURCE_SMS_FILE>) { if ($_ =~ m/^,/) { my @lines= split(",",$_); chop($_); if ($lines68 =~ /^I$/) { $uniquekey= $lines17.$lines68.$lines21.$lines44.substr($lines27,0,8); #actual content of file having filename also# $CDR = substr($lines27,0,8).','.$lines28.','.$lines36.','.$lines23.','.$lines24.','.$lines91.','.$lines92.','.$lines101.','.$lines15.','.$lines18.','.$lines75.','.$lines21.','.$lines44.','.$lines14.','.substr($lines69,0,8).','.$lines1.','.$lines13.','.$lines50; #filename from content# $file_name = $lines17.'_'.$lines68.'_'.$lines21.'_'.$lines44.'_'.substr($lines27,0,8); $myHash{$uniquekey}++; } if ($lines68 =~ /^O$/) { $uniquekey1= $lines17.$lines68.$lines22.$lines45.substr($lines27,0,8); $CDR1 = substr($lines27,0,8).','.$lines28.','.$lines37.','.$lines23.','.$lines24.','.$lines91.','.$lines92.','.$lines101.','.$lines16.','.$lines19.','.$lines75.','.$lines22.','.$lines45.','.$lines14.','.substr($lines69,0,8).','.$lines1.','.$lines13.','.$lines50; $file_name1 = $lines17.'_'.$lines68.'_'.$lines22.'_'.$lines45.'_'.substr($lines27,0,8); $myHash1{$uniquekey1}++; } } foreach $key (keys %myHash) { $uniq1=$file_name; sysopen($CircleGroupHandle,"$ARGV[0]/$uniq1.csv",O_RDWR|O_APPEND|O_CREAT)or die "Error writing to $GroupSMSFileName"; print $CircleGroupHandle "$CDR\n"; } %myHash = (); %$uniquekey = (); $uniquekey = {}; foreach $key1 (keys %myHash1) { $uniq2=$file_name1; sysopen($CircleGroupHandle1,"$ARGV[0]/$uniq2.csv",O_RDWR|O_APPEND|O_CREAT)or die "Error writing to $GroupSMSFileName"; print $CircleGroupHandle1 "$CDR1\n"; } %myHash1 = (); %$uniquekey1 = (); $uniquekey1 = {}; } close(SOURCE_SMS_FILE); close($CircleGroupHandle) or die "Error closing to $CircleGroupHandle"; close($CircleGroupHandle1) or die "Error closing to $CircleGroupHandle"; }

        First of all when you post code wrap it with <code></code> this is included in Writeup Formatting Tips

        Why don't you apply the changes that I've proposed to you and measure the time again? Because those changes should give you an speed up in the running time

        It's hardly that with any dynamic language Perl, Python, Ruby you'll reach the speed of a C program, so if your C program is optimized and 2 minutes is the mark don't expect Perl to beat that