Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

hi..I am a perl beginner..i have this perl script which takes an input file and apply two sed commands..

#!/usr/bin/perl use Data::Dumper; use File::Copy; for( ; ; ) { my $fileexists= -e "text.out"; if ($fileexists ne "1") { `touch text.out`; foreach my $file (</data/admin/scripts/SapcmedadpebM/xdecoder/decodedi +r/MHSAPC/*csv>) { chomp; $abc1=`find $file -mmin +10`; chomp($abc1); if ($abc1 eq "") { next; } print " file $abc1 \n"; $dd=`date`; print "$dd\n"; `perl -i -pe 's/[^[:ascii:]]//g; tr/\015//d' $abc1`; print "junk character removed\n"; `sed -i 's/QOS_PROFILE_ID/x1/g;s/CHARGING_PROFILE_ID/x2/g;s/CONTENT_FI +LTERING_PROFILE_ID/x3/g;s/SUBSCRIBERID/x4/g;s/RECORD_LENGTH/x5/g;s/RE +CORD_TYPE/x6/g;s/EVENT_ID/x7/g;s/EVENT_RESULT/x8/g;s/CAUSE_PROTOCOL/x +9/g;s/DEFAULT_BEARER_ID/x0/g;s/ARP_PRIORITY_LEVEL/y1/g;s/ARP_CAPABILI +TY/y2/g;s/ARP_VULNERABILITY/y3/g;s/BEARER_CONTROL_MODE/y4/g;s/TRACKIN +G_AREA_CODE/y5/g;s/ROUTING_AREA_CODE/y7/g;s/SERVICE_AREA_CODE/y8/g;s/ +SYSTEM_IDENTIFIER/y9/g;s/NETWORK_IDENTIFIER/y0/g' $abc1`; `sed -i 's/GX_RAR_RAA_TRANSACTION/TRAR/g;s/GX_CCR_CCA_TRANSACTION/TCCA +/g;s/QUOTA_GRANTED/TQG/g;s/QOS_ASSIGNED_TO_DEFAULT_BEARER/TQA/g;s/RUL +E_INSTALLED/TRI/g;s/RULE_REMOVED/TRR/g' $abc1`; move($abc1,"/data/admin/scripts/SapcmedadpebM/xdecoder/decodedir/compl +eat/"); #`mv -i $abc1 /data/admin/scripts/SapcmedadpebM/xdecoder/decodedir/com +pleat/`; } unlink "text.out"; } else { print "Exiting \n"; exit; } sleep(100); }

Now the problem is that above two sed commands are taking more time than required...so could you please help me find the better and faster replacement of tese commands? please let me know if you need further info. thanks...

Replies are listed 'Best First'.
Re: Faster replacement of sed commands
by Athanasius (Archbishop) on Aug 19, 2014 at 07:42 UTC

    To expand a little on McA’s answer, translation of:

    `sed -i 's/QOS_PROFILE_ID/x1/g;s/CHARGING_PROFILE_ID/x2/g;s/CONTENT_FI +LTERING_PROFILE_ID/x3/g;...' $abc1`;

    to Perl is straightforward:

    $abc1 =~ s/QOS_PROFILE_ID/x1/g; $abc1 =~ s/CHARGING_PROFILE_ID/x2/g; $abc1 =~ s/CONTENT_FILTERING_PROFILE_ID/x3/g; ...

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Faster replacement of sed commands..
by McA (Priest) on Aug 19, 2014 at 07:21 UTC

    Hi,

    you're using Perl in a very "shellish" way. What I mean is you're using the backtick operator very often which creates a subprocess doing the shell command. This is expensive. You can do all the tasks directly in Perl reducing the amount of subprocess creations.

    In your case: When you know that the two sed commands are what costs most of the time: Why have you two seperate calls to sed? In every sed command you have multiple substitute calls. Why don't you put all of them in one sed call?

    By the way: What is the sleep(100) good for besides waisting time?

    Regards
    McA

Re: Faster replacement of sed commands..
by johngg (Canon) on Aug 19, 2014 at 09:36 UTC

    I think it would be much clearer to use a lookup table (stored in a hash), construct a single capturing regex from the keys of the hash and do a single global substitution operation. It would also be clearer for you and those who come after if you employ a consistent code indentation scheme.

    use strict; use warnings; use feature qw{ say }; my %replacementLU = ( QOS_PROFILE_ID => q{x1}, CHARGING_PROFILE_ID => q{x2}, CONTENT_FILTERING_PROFILE_ID => q{x3}, SUBSCRIBERID => q{x4}, RECORD_LENGTH => q{x5}, RECORD_TYPE => q{x6}, EVENT_ID => q{x7}, EVENT_RESULT => q{x8}, CAUSE_PROTOCOL => q{x9}, DEFAULT_BEARER_ID => q{x0}, ARP_PRIORITY_LEVEL => q{y1}, ARP_CAPABILITY => q{y2}, ARP_VULNERABILITY => q{y3}, BEARER_CONTROL_MODE => q{y4}, TRACKING_AREA_CODE => q{y5}, ROUTING_AREA_CODE => q{y7}, SERVICE_AREA_CODE => q{y8}, SYSTEM_IDENTIFIER => q{y9}, NETWORK_IDENTIFIER => q{y0}, GX_RAR_RAA_TRANSACTION => q{TRAR}, GX_CCR_CCA_TRANSACTION => q{TCCA}, QUOTA_GRANTED => q{TQG}, QOS_ASSIGNED_TO_DEFAULT_BEARER => q{TQA}, RULE_INSTALLED => q{TRI}, RULE_REMOVED => q{TRR}, ); my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; }; my $text = q{blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl}; say $text; $text =~ s{$subsRE}{$replacementLU{ $1 }}g; say $text;

    The output.

    blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl blurfl,y3,17,blargle,x7,4,glogl

    I hope this is helpful.

    Cheers,

    JohnGG

      Hi...it worked for me..thanks a lot..but could you do me a little more favor? not able to understand below part of your code..

      my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; };

      please help me understand this...appreciate your efforts...

        The solution by johngg has the elegance that makes Perl such a great language. To understand it, let’s work backwards:

        $text =~ s{$subsRE}{$replacementLU{ $1 }}g;

        This is where all the actual substitutions take place. The first part, {$subsRE}, looks for keyword matches (see below), and the /g modifier keeps looking until no more matches can be found. For each match found, the keyword, referenced by $1, is used as a hash key in the lookup table %replacementLU, and the value corresponding to that key is used for the substitution. So, for example, ARP_VULNERABILITY is replaced by y3.

        OK, you knew all that, but where does $subsRE fit in? Let’s print it out to see what it looks like:

        (?^:(?x) \b ( CONTENT_FILTERING_PROFILE_ID|QUOTA_GRANTED|ARP_VULNERABI +LITY|NETWORK_IDENTIFIER|ARP_PRIORITY_LEVEL|DEFAULT_BEARER_ID|EVENT_RE +SULT|EVENT_ID|QOS_PROFILE_ID|ARP_CAPABILITY|SYSTEM_IDENTIFIER|TRACKIN +G_AREA_CODE|GX_RAR_RAA_TRANSACTION|SERVICE_AREA_CODE|RECORD_TYPE|RECO +RD_LENGTH|CHARGING_PROFILE_ID|QOS_ASSIGNED_TO_DEFAULT_BEARER|GX_CCR_C +CA_TRANSACTION|RULE_REMOVED|BEARER_CONTROL_MODE|ROUTING_AREA_CODE|RUL +E_INSTALLED|CAUSE_PROTOCOL|SUBSCRIBERID ) \b)

        As you can see, this says: match any one of the keywords provided it is preceded and followed by a word boundary (\b). The character | separating the keywords is the metacharacter for alternation; for example, A|B|C means: match either A, or B, or C. (See “Metacharacters” in perlre#Regular-Expressions.) Note the capturing parentheses: if any of the keywords is matched, it is captured into the next available capture variable (which in this case is $1).

        OK, so where did this monster $subsRE come from? It would be no fun constructing this by hand, so johngg harnessed Perl to do the work. Note that qr// is a the Perl regex quote operator: it converts a string into a regular expression (see perlop#Regexp-Quote-Like-Operators). (?x) is the /x modifier in a different form. The string argument to qr// is constructed by interpolating the keys of the hash %replacementLU into the string. But just saying this:

        qr{(?x) \b ( keys %replacementLU ) \b};

        wouldn’t work because Perl would think you want to match the literal characters keys %replacementLU. Perl will interpolate when it sees a $ (for a scalar) or an @ (for an array), so we need to give Perl a construct like this: @{ ... }. But that says, dereference (something) to get an array. So we need to convert keys %replacementLU (which returns a list) into an array reference, which we do by creating an anonymous array with square brackets. So

        @{ [ keys %replacementLU ] }

        is the Perlish idiom for interpolating the contents of the list returned by the keys function into the string.

        Now all we need is to separate the elements of the list with | (alternation) characters. Normally, when a list is interpolated, the elements are separated by spaces. But actually they’re separated by whatever is the contents of the special variable $", for which a space is the default. By changing it to |, we get the elements of the list separated by alternation characters, which gives us the regex we want.

        johngg could have just said:

        $" = q{|}; my $subsRE = qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};

        but that would leave $" set to |, which might interfere with other parts of the script. It’s better practice to localise any temporary changes made to global variables. The syntax:

        my $subsRE = do { local $" = q{|}; qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b}; };

        uses local to limit the scope of the assignment, and takes advantage of the fact that the do { ... }; syntax (1) provides an enclosing scope for the local $" = q{|}; assignment; and (2) returns the value of its final statement, in this case the regex returned by qr//.

        As Hannibal Smith liked to say: “I love it when a plan comes together.” :-)

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Faster replacement of sed commands..
by aitap (Curate) on Aug 19, 2014 at 10:19 UTC

    For your simple sed commands, Perl equivalent can be produced by replacing sed -i with perl -pi -e. Having done this, we can use O 'Deparse' to get the actual Perl code produced by these switches:

    $ perl -MO=Deparse -pi -e 's/GX_RAR_RAA_TRANSACTION/TRAR/g;s/GX_CCR_CC +A_TRANSACTION/TCCA/g;s/QUOTA_GRANTED/TQG/g;s/QOS_ASSIGNED_TO_DEFAULT_ +BEARER/TQA/g;s/RULE_INSTALLED/TRI/g;s/RULE_REMOVED/TRR/g' BEGIN { $^I = ""; } LINE: while (defined($_ = <ARGV>)) { s/GX_RAR_RAA_TRANSACTION/TRAR/g; s/GX_CCR_CCA_TRANSACTION/TCCA/g; s/QUOTA_GRANTED/TQG/g; s/QOS_ASSIGNED_TO_DEFAULT_BEARER/TQA/g; s/RULE_INSTALLED/TRI/g; s/RULE_REMOVED/TRR/g; } continue { die "-p destination: $!\n" unless print $_; } -e syntax OK
    ARGV is a special filehandle which iterates over files supplied in command line arguments (@ARGV array).

    Thus we can inline all your `perl ...` and `sed ...` into main program (untested):

    We can also put all these substitutions in a hash to make code less repetitive: I omitted the infinite loop and file creation. Do you need a lock file? Run perldoc -q lock to get instructions on proper locking: your current implementation may be prone to race conditions.

    See also: localtime, quotemeta, $^I, local, perlfaq5.

Re: Faster replacement of sed commands..
by pvaldes (Chaplain) on Aug 20, 2014 at 23:23 UTC

    unrelated with the topic, but perhaps useful to make your script a little faster:

    you could want to put the whole "/data/admin/scsripts/SapcmedadpebM/xdecoder/decodedir" thing in a var. The purpose of this is to avoid typing mistakes and to write shorter sentences easier to debug. Something like: (untested).

    my $path = "/very/long/path/here"; # or maybe: opendir(my $path, "/very/long/path/here"); move($abc1,"$path/compleat/"); foreach my $file (<$path/MHSAPC/*csv>){chomp; ... ## read also about +glob

    You probably don't need to use the touch line here, just open the file. Open avoids you later the need for an one-liner inside a perl script, (bad style to me, you are opening two perl processes and the second is not necessary)

    `perl -i -pe 's/[^[:ascii:]]//g; tr/\015//d' $abc1`; ## to avoid

    avoid `find -mmin ...`, use stat instead:

    $mmtime = (stat($file))[9]; if ($mmtime > 600){do something ....} ; ## note that these are seconds, 10 min = 600 sec
    if ($abc1 eq "") { next; }

    No need to spend four lines for this, probably:

    next if ($abc1 eq "");