in reply to Re^3: Faster replacement of sed commands
in thread Faster replacement of sed commands..

Thank you very much for the explanation...appreciate your efforts. Script is working great for me I have updated the script a little as per my requirements.

#!/usr/bin/perl use Data::Dumper; use File::Copy; use feature qw{ say }; my %replacementLU = ( QOS_PROFILE_ID => q{x1}, CHARGING_PROFILE_ID => q{x2}, CONTENT_FILTERING_PROFILE_ID => q{x3}, SUBSCRIBERID => q{x4}, RECORD_LENGTH => q{x5}, RECORD_TYPE => q{x6}, EVENT_ID => q{x7}, EVENT_RESULT => q{x8}, CAUSE_PROTOCOL => q{x9}, DEFAULT_BEARER_ID => q{x0}, ARP_PRIORITY_LEVEL => q{y1}, ARP_CAPABILITY => q{y2}, ARP_VULNERABILITY => q{y3}, BEARER_CONTROL_MODE => q{y4}, TRACKING_AREA_CODE => q{y5}, ROUTING_AREA_CODE => q{y7}, SERVICE_AREA_CODE => q{y8}, SYSTEM_IDENTIFIER => q{y9}, NETWORK_IDENTIFIER => q{y0}, GX_RAR_RAA_TRANSACTION => q{TRAR}, GX_CCR_CCA_TRANSACTION => q{TCCA}, QUOTA_GRANTED => q{TQG}, QOS_ASSIGNED_TO_DEFAULT_BEARER => q{TQA}, RULE_INSTALLED => q{TRI}, RULE_REMOVED => q{TRR}, ); my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; }; my $counter = 0; open (WFH1, ">", "counter.txt"); for( ; ; ) { my $fileexists= -e "text.out"; if ($fileexists ne "1") { `touch text.out`; foreach my $file (</data/admin/scripts/SapcmedadpebM/test/*csv>) { chomp; $abc1=`find $file -mmin +10`; chomp($abc1); if ($abc1 eq "") { next; } print " file $abc1 \n"; $dd=`date`; print "$dd\n"; `perl -i -pe 's/[^[:ascii:]]//g; tr/\015//d' $abc1`; print "junk character removed\n"; open (FH, "$abc1"); open (WFH, ">", "abc1.op"); while (<FH>) { $_ =~ s{$subsRE}{$replacementLU{ $1 }}g; print WFH $_; } #`sed -i 's/QOS_PROFILE_ID/x1/g;s/CHARGING_PROFILE_ID/x2/g;s/CONTENT_F +ILTERING_PROFILE_ID/x3/g;s/SUBSCRIBERID/x4/g;s/RECORD_LENGTH/x5/g;s/R +ECORD_TYPE/x6/g;s/EVENT_ID/x7/g;s/EVENT_RESULT/x8/g;s/CAUSE_PROTOCOL/ +x9/g;s/DEFAULT_BEARER_ID/x0/g;s/ARP_PRIORITY_LEVEL/y1/g;s/ARP_CAPABIL +ITY/y2/g;s/ARP_VULNERABILITY/y3/g;s/BEARER_CONTROL_MODE/y4/g;s/TRACKI +NG_AREA_CODE/y5/g;s/ROUTING_AREA_CODE/y7/g;s/SERVICE_AREA_CODE/y8/g;s +/SYSTEM_IDENTIFIER/y9/g;s/NETWORK_IDENTIFIER/y0/g' $abc1`; #`sed -i 's/GX_RAR_RAA_TRANSACTION/TRAR/g;s/GX_CCR_CCA_TRANSACTION/TCC +A/g;s/QUOTA_GRANTED/TQG/g;s/QOS_ASSIGNED_TO_DEFAULT_BEARER/TQA/g;s/RU +LE_INSTALLED/TRI/g;s/RULE_REMOVED/TRR/g' $abc1`; move("abc1.op","./$abc1"); $counter++; print WFH1 $counter; } unlink "text.out"; } else { print "Exiting \n"; exit; } sleep(100); }

Now when I execute this script to process a 400Mb i/p file (which it the required size) after generating 300Mb of output data it sticks.. doesn't throw any error, or doesn't fail..it just stops generating output data. Then I tried with a 200Mb file and again it stuck at 160Mb.. without any error message I am not able to find the root cause.. could it be a memory issue? could you please suggest any thing which can help us get this issue resolved? Please let me know if you need more info...thnaks

Replies are listed 'Best First'.
Re^5: Faster replacement of sed commands
by Athanasius (Archbishop) on Aug 25, 2014 at 15:02 UTC

    I think you should follow the advice given by McA above:

    ...you're using the backtick operator very often which creates a subprocess doing the shell command. This is expensive. You can do all the tasks directly in Perl reducing the amount of subprocess creations.

    In particular, this line:

    `perl -i -pe 's/[^[:ascii:]]//g; tr/\015//d' $abc1`;

    creates a new shell subprocess with its own copy of the Perl interpreter on every loop iteration! Look at the answers already given by aitap and pvaldes, below, for advice on how to re-write your code in pure Perl, without using the backtick operator. There is no point in looking at other optimisations until you’ve removed the overhead of all those unnecessary shell subprocesses.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,