in reply to Re^2: using system command in regex
in thread using system command in regex

my $SMPP_count = int ((split (/\s+/,`cut -d "|" -f 1,10,13 SMSCDR*$ +date$hour$minute*.log |grep "Submit|GSM" |grep "$hour:$min:$sec" |sor +t |uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "|" -f 1,10,13 SMSCDR +*$date$hour$minute*.log |grep "Submit|SMPP" |grep "$hour:$min:$sec" | +sort |uniq -c`)) [1]);

Short answer: since you are using uniq -c as the last filter in you pipelines, you are interested in the first field. This field has leading whitespace. From the documentation of split:

As another special case, "split" emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20", but not e.g. "/ /"). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were "/\s+/"; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.

Long answer: you are running

which makes for 11 processes in total, and you are reading each file that matches SMSCDR*$date$hour$minute*.log twice - to get a sum which perl happily would give you in a less convoluted way in just 1 process.

From your code I am guessing that your log files contain a timestamp in the first field, Submit occurs in the 10th field, and you want lines which contain GSMor SMPP in the 13th field.
Putting it all together, omitting uneccesary steps and not writing perl as if it were shell:

@ARGV = glob "SMSCDR*$date$hour$minute*.log"; my $SMPP_count; my $stamp = "$hour:$min:$sec"; while(<>){ chomp; my @ary = (split '|')[0,9,12]; $SMPP_count++ if $ary[0] eq $stamp and $ary[1] eq "Submit" and $ary[2] =~ /(GSM|SMPP)/; }; print $SMPP_count;
update: corrected code
perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Replies are listed 'Best First'.
Re^4: using system command in regex
by ravi45722 (Pilgrim) on Oct 14, 2015 at 05:46 UTC

    Thanks for sharing your knowledge shmem. I replaced the code you wrote (with some needed modifications) in my program. But Its really dead slow.

     the code took:289 wallclock secs ( 6.43 usr  0.72 sys + 388.95 cusr 58.95 csys = 455.05 CPU)

    This result is for my program (using grep & cut). But after modified in this 300 wallclock secs its runs the data only for 36 seconds. Then to complete the total 10 minutes data it will take nearly .... (Actually I dont know :-) ). My modified code is

    my $greatest = 0; my $total = 0; my @files = glob "SMSCDR*$date$hour$minute*.log"; foreach my $min ($minute .. $minute+9) { foreach my $sec (@seconds) { # my $SMPP_count = int ((split (/\s+/,`cut -d "|" -f 1,1 +0,13 SMSCDR*$date$hour$minute*.log |grep "Submit|GSM" |grep "$hour:$m +in:$sec" |sort |uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "|" -f 1 +,10,13 SMSCDR*$date$hour$minute*.log |grep "Submit|SMPP" |grep "$hour +:$min:$sec" |sort |uniq -c`)) [1]); my $SMPP_count; my $stamp = "$hour:$min:$sec"; foreach my $file (@files) { open (FILE,"$file"); while(<FILE>) { chomp; my @ary = (split /\s|\|/, $_) [3,21,24 +]; $SMPP_count++ if $ary[0] eq $stamp an +d $ary[1] eq "Submit" and $ary[2] =~ /(GSM|SMPP)/; }; } if ($SMPP_count > $greatest) { $greatest = $SMPP_count; } $total = $total + $SMPP_count; print "$hour:$min:$sec","= $SMPP_count","\t",$total,$/ +; } } print $greatest,$/; my $t1 = Benchmark->new; my $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n";

    Note: The result is same for both the programs. As you said the reading of files has to be faster than grep & cut. But here its not working like that. Where I am missing I don't understand.

    Update:

    Here when you use glob it will return three files. In that post_paid contains 8278 lines, prepaid contains 23072 lines, delivery_file contains 80097 lines. If you are calculating for first second it have to check "1,11,447" lines. Like that for 10 minutes (600 seconds) the program had to check "6,68,68,200" lines. Please show me a way to get rid of this

      Of course it is dead slow. You are opening, reading and closing each file for 10 * 60 = 600 times to get the sum for each second. You should read each file once and store the sums in a hash, keyed by the timestamp, like so:

      my %SMPP_count; foreach my $file (@files) { open (FILE,"$file"); while(<FILE>) { next unless /\b(?:GSM|SMPP)\b/; # avoid uninteresting +lines chomp; my @ary = (split /\s|\|/, $_) [3,21,24]; my $time = (split ' ', $ary[0])[4]; # first element is +t timestamp, right? $SMPP_count{$time}++ if $ary[0] eq $stamp and $ary[1] + eq "Submit" and $ary[2] =~ /(GSM|SMPP)/; }; } # now iterate over the keys of the hash to make up your sums for my $time ( sort keys %SMPP_count) { my $sum = $SMPP_count{$time}; ... }
      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

        What an Idea pragrammatic.... Nearly from three days struggling with this. Finally I solved it. Working great. Thank you very much. And I want to ask you one more thing. As already told I have three (Postpaid, Prepaid, Delivery) files for every 10 minutes. After calculating I am uploading all the values into DB like this.

        Date Hour Mo_resp MT_resp AO_resp Percentage 10-08-2015 00:00 256 382 36 87% 10-08-2015 00:10 491 438 12 92%

        (Its a sample. Actually my DB contains 38 columns) Like this I am uploading all values for every 10 minutes. Now my requirement is to add all MO_resp all AO_resp ... and son on (all columns) which occurred in 00 hour to write an hourly report in excel sheet. For that I am Doing

        use DBI; my $hour_db = DBI->connect("DBI:mysql:database=$db;host=$host;mysql_so +cket=/opt/lampstack-5.5.27-0/mysql/tmp/mysql.sock","root","", {'Raise +Error' => 1}); my @column_names = ("MO_resp","MT_resp","AO_resp"); foreach my $column_name (@column_names) { my $hour_sth = $hour_db->prepare("Select sum($column_name) from $t +able_name where Date='$db_date' and Hour like='$hour:%'"); $hour_sth->execute() or die $DBI::errstr; ..... }

        #Like this I am reading each column sum one by one. But I feel this is not a good method. Can you show me a way???

      Thanks for sharing your knowledge shmem.

      Heh. My nick is programmatic. - Please provide sample input. What does your data look like? What are you trying to acomplish? what is your expected output? Just a sum? This whole thread is about an XY Problem it would seem.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

        shmem :-) Programmatic I already posted that in this thread. Please check it here 1144697