Faster replacement of sed commands..

Ankur_kuls has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Faster replacement of sed commands
by Athanasius (Archbishop) on Aug 19, 2014 at 07:42 UTC

To expand a little on McA’s answer, translation of:

`sed -i 's/QOS_PROFILE_ID/x1/g;s/CHARGING_PROFILE_ID/x2/g;s/CONTENT_FI
+LTERING_PROFILE_ID/x3/g;...' $abc1`;
[download]

to Perl is straightforward:

$abc1 =~ s/QOS_PROFILE_ID/x1/g;
$abc1 =~ s/CHARGING_PROFILE_ID/x2/g;
$abc1 =~ s/CONTENT_FILTERING_PROFILE_ID/x3/g;
...
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: Faster replacement of sed commands..
by McA (Priest) on Aug 19, 2014 at 07:21 UTC

Hi,

you're using Perl in a very "shellish" way. What I mean is you're using the backtick operator very often which creates a subprocess doing the shell command. This is expensive. You can do all the tasks directly in Perl reducing the amount of subprocess creations.

In your case: When you know that the two sed commands are what costs most of the time: Why have you two seperate calls to sed? In every sed command you have multiple substitute calls. Why don't you put all of them in one sed call?

By the way: What is the sleep(100) good for besides waisting time?

Regards
McA

[reply]
[d/l]

Re: Faster replacement of sed commands..
by johngg (Canon) on Aug 19, 2014 at 09:36 UTC

I think it would be much clearer to use a lookup table (stored in a hash), construct a single capturing regex from the keys of the hash and do a single global substitution operation. It would also be clearer for you and those who come after if you employ a consistent code indentation scheme.

use strict;
use warnings;

use feature qw{ say };

my %replacementLU = (
   QOS_PROFILE_ID                 => q{x1},
   CHARGING_PROFILE_ID            => q{x2},
   CONTENT_FILTERING_PROFILE_ID   => q{x3},
   SUBSCRIBERID                   => q{x4},
   RECORD_LENGTH                  => q{x5},
   RECORD_TYPE                    => q{x6},
   EVENT_ID                       => q{x7},
   EVENT_RESULT                   => q{x8},
   CAUSE_PROTOCOL                 => q{x9},
   DEFAULT_BEARER_ID              => q{x0},
   ARP_PRIORITY_LEVEL             => q{y1},
   ARP_CAPABILITY                 => q{y2},
   ARP_VULNERABILITY              => q{y3},
   BEARER_CONTROL_MODE            => q{y4},
   TRACKING_AREA_CODE             => q{y5},
   ROUTING_AREA_CODE              => q{y7},
   SERVICE_AREA_CODE              => q{y8},
   SYSTEM_IDENTIFIER              => q{y9},
   NETWORK_IDENTIFIER             => q{y0},
   GX_RAR_RAA_TRANSACTION         => q{TRAR},
   GX_CCR_CCA_TRANSACTION         => q{TCCA},
   QUOTA_GRANTED                  => q{TQG},
   QOS_ASSIGNED_TO_DEFAULT_BEARER => q{TQA},
   RULE_INSTALLED                 => q{TRI},
   RULE_REMOVED                   => q{TRR},
   );

my $subsRE = do {
   local $" = q{|};
   qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};
   };

my $text = q{blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl};
say $text;

$text =~ s{$subsRE}{$replacementLU{ $1 }}g;

say $text;
[download]

The output.

blurfl,ARP_VULNERABILITY,17,blargle,EVENT_ID,4,glogl
blurfl,y3,17,blargle,x7,4,glogl
[download]

I hope this is helpful.

Cheers,

JohnGG

[reply]
[d/l]
[select]

Re^2: Faster replacement of sed commands..

by Anonymous Monk on Aug 20, 2014 at 04:30 UTC

Hi...it worked for me..thanks a lot..but could you do me a little more favor? not able to understand below part of your code..

my $subsRE = do {
   local $" = q{|};
   qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};
   };
[download]

please help me understand this...appreciate your efforts...

[reply]
[d/l]

Re^3: Faster replacement of sed commands

by Athanasius (Archbishop) on Aug 20, 2014 at 09:11 UTC

The solution by johngg has the elegance that makes Perl such a great language. To understand it, let’s work backwards:

$text =~ s{$subsRE}{$replacementLU{ $1 }}g;
[download]

This is where all the actual substitutions take place. The first part, {$subsRE}, looks for keyword matches (see below), and the /g modifier keeps looking until no more matches can be found. For each match found, the keyword, referenced by $1, is used as a hash key in the lookup table %replacementLU, and the value corresponding to that key is used for the substitution. So, for example, ARP_VULNERABILITY is replaced by y3.

OK, you knew all that, but where does $subsRE fit in? Let’s print it out to see what it looks like:

(?^:(?x) \b ( CONTENT_FILTERING_PROFILE_ID|QUOTA_GRANTED|ARP_VULNERABI
+LITY|NETWORK_IDENTIFIER|ARP_PRIORITY_LEVEL|DEFAULT_BEARER_ID|EVENT_RE
+SULT|EVENT_ID|QOS_PROFILE_ID|ARP_CAPABILITY|SYSTEM_IDENTIFIER|TRACKIN
+G_AREA_CODE|GX_RAR_RAA_TRANSACTION|SERVICE_AREA_CODE|RECORD_TYPE|RECO
+RD_LENGTH|CHARGING_PROFILE_ID|QOS_ASSIGNED_TO_DEFAULT_BEARER|GX_CCR_C
+CA_TRANSACTION|RULE_REMOVED|BEARER_CONTROL_MODE|ROUTING_AREA_CODE|RUL
+E_INSTALLED|CAUSE_PROTOCOL|SUBSCRIBERID ) \b)
[download]

As you can see, this says: match any one of the keywords provided it is preceded and followed by a word boundary (\b). The character | separating the keywords is the metacharacter for alternation; for example, A|B|C means: match either A, or B, or C. (See “Metacharacters” in perlre#Regular-Expressions.) Note the capturing parentheses: if any of the keywords is matched, it is captured into the next available capture variable (which in this case is $1).

OK, so where did this monster $subsRE come from? It would be no fun constructing this by hand, so johngg harnessed Perl to do the work. Note that qr// is a the Perl regex quote operator: it converts a string into a regular expression (see perlop#Regexp-Quote-Like-Operators). (?x) is the /x modifier in a different form. The string argument to qr// is constructed by interpolating the keys of the hash %replacementLU into the string. But just saying this:

qr{(?x) \b ( keys %replacementLU ) \b};
[download]

wouldn’t work because Perl would think you want to match the literal characters keys %replacementLU. Perl will interpolate when it sees a $ (for a scalar) or an @ (for an array), so we need to give Perl a construct like this: @{ ... }. But that says, dereference (something) to get an array. So we need to convert keys %replacementLU (which returns a list) into an array reference, which we do by creating an anonymous array with square brackets. So

@{ [ keys %replacementLU ] }
[download]

is the Perlish idiom for interpolating the contents of the list returned by the keys function into the string.

Now all we need is to separate the elements of the list with | (alternation) characters. Normally, when a list is interpolated, the elements are separated by spaces. But actually they’re separated by whatever is the contents of the special variable $", for which a space is the default. By changing it to |, we get the elements of the list separated by alternation characters, which gives us the regex we want.

johngg could have just said:

$" = q{|};
my $subsRE = qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};
[download]

but that would leave $" set to |, which might interfere with other parts of the script. It’s better practice to localise any temporary changes made to global variables. The syntax:

my $subsRE = do {
   local $" = q{|};
   qr{(?x) \b ( @{ [ keys %replacementLU ] } ) \b};
   };
[download]

uses local to limit the scope of the assignment, and takes advantage of the fact that the do { ... }; syntax (1) provides an enclosing scope for the local $" = q{|}; assignment; and (2) returns the value of its final statement, in this case the regex returned by qr//.

As Hannibal Smith liked to say: “I love it when a plan comes together.” :-)

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^4: Faster replacement of sed commands

by Ankur_kuls (Sexton) on Aug 25, 2014 at 11:38 UTC

Re^5: Faster replacement of sed commands

by Athanasius (Archbishop) on Aug 25, 2014 at 15:02 UTC

Re: Faster replacement of sed commands..
by aitap (Curate) on Aug 19, 2014 at 10:19 UTC

For your simple sed commands, Perl equivalent can be produced by replacing sed -i with perl -pi -e. Having done this, we can use O 'Deparse' to get the actual Perl code produced by these switches:

$ perl -MO=Deparse -pi -e 's/GX_RAR_RAA_TRANSACTION/TRAR/g;s/GX_CCR_CC
+A_TRANSACTION/TCCA/g;s/QUOTA_GRANTED/TQG/g;s/QOS_ASSIGNED_TO_DEFAULT_
+BEARER/TQA/g;s/RULE_INSTALLED/TRI/g;s/RULE_REMOVED/TRR/g'
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
    s/GX_RAR_RAA_TRANSACTION/TRAR/g;
    s/GX_CCR_CCA_TRANSACTION/TCCA/g;
    s/QUOTA_GRANTED/TQG/g;
    s/QOS_ASSIGNED_TO_DEFAULT_BEARER/TQA/g;
    s/RULE_INSTALLED/TRI/g;
    s/RULE_REMOVED/TRR/g;
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK
[download]

ARGV

@ARGV

Thus we can inline all your `perl ...` and `sed ...` into main program (untested):