Re^3: using system command in regex

Short answer: since you are using uniq -c as the last filter in you pipelines, you are interested in the first field. This field has leading whitespace. From the documentation of split:

As another special case, "split" emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20", but not e.g. "/ /"). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were "/\s+/"; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.

Long answer: you are running

1 x perl
2 x /bin/sh (at each qx() or backtick ``)
4 x grep
2 x sort
2 x uniq

which makes for 11 processes in total, and you are reading each file that matches SMSCDR*$date$hour$minute*.log twice - to get a sum which perl happily would give you in a less convoluted way in just 1 process.

cut -d "|" -f 1,10,13 would be (split '|', $_)[0,9,12]
grep and sort are perl builtins
use a hash (see perldata) and use its keys for uniqueness
you can use the bultin glob to expand SMSCDR*$date$hour$minute*.log into a list of filenames

From your code I am guessing that your log files contain a timestamp in the first field, Submit occurs in the 10^th field, and you want lines which contain GSMor SMPP in the 13^th field.
Putting it all together, omitting uneccesary steps and not writing perl as if it were shell:

@ARGV = glob "SMSCDR*$date$hour$minute*.log";
my $SMPP_count;
my $stamp = "$hour:$min:$sec";
while(<>){
    chomp;
    my @ary = (split '|')[0,9,12];
    $SMPP_count++ 
        if  $ary[0] eq $stamp
        and $ary[1] eq "Submit"
        and $ary[2] =~ /(GSM|SMPP)/;
};
print $SMPP_count;
[download]

update: corrected code

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Comment on Re^3: using system command in regex Select or Download Code

Replies are listed 'Best First'.
Re^4: using system command in regex by ravi45722 (Pilgrim) on Oct 14, 2015 at 05:46 UTC
Thanks for sharing your knowledge shmem. I replaced the code you wrote (with some needed modifications) in my program. But Its really dead slow. `the code took:289 wallclock secs ( 6.43 usr 0.72 sys + 388.95 cusr 58.95 csys = 455.05 CPU)` This result is for my program (using grep & cut). But after modified in this 300 wallclock secs its runs the data only for 36 seconds. Then to complete the total 10 minutes data it will take nearly .... (Actually I dont know :-) ). My modified code is my $greatest = 0; my $total = 0; my @files = glob "SMSCDR$date$hour$minute.log"; foreach my $min ($minute .. $minute+9) { foreach my $sec (@seconds) { # my $SMPP_count = int ((split (/\s+/,`cut -d "\|" -f 1,1 +0,13 SMSCDR$date$hour$minute.log \|grep "Submit\|GSM" \|grep "$hour:$m +in:$sec" \|sort \|uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "\|" -f 1 +,10,13 SMSCDR$date$hour$minute.log \|grep "Submit\|SMPP" \|grep "$hour +:$min:$sec" \|sort \|uniq -c`)) [1]); my $SMPP_count; my $stamp = "$hour:$min:$sec"; foreach my $file (@files) { open (FILE,"$file"); while(<FILE>) { chomp; my @ary = (split /\s\|\\|/, $_) [3,21,24 +]; $SMPP_count++ if $ary[0] eq $stamp an +d $ary[1] eq "Submit" and $ary[2] =~ /(GSM\|SMPP)/; }; } if ($SMPP_count > $greatest) { $greatest = $SMPP_count; } $total = $total + $SMPP_count; print "$hour:$min:$sec","= $SMPP_count","\t",$total,$/ +; } } print $greatest,$/; my $t1 = Benchmark->new; my $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; [download] Note: The result is same for both the programs. As you said the reading of files has to be faster than grep & cut. But here its not working like that. Where I am missing I don't understand. Update: Here when you use glob it will return three files. In that post_paid contains 8278 lines, prepaid contains 23072 lines, delivery_file contains 80097 lines. If you are calculating for first second it have to check "1,11,447" lines. Like that for 10 minutes (600 seconds) the program had to check "6,68,68,200" lines. Please show me a way to get rid of this	[reply] [d/l] [select]
Re^5: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 07:51 UTC
Of course it is dead slow. You are opening, reading and closing each file for *10 60 = 600 times** to get the sum for each second. You should read each file once and store the sums in a hash, keyed by the timestamp, like so: my %SMPP_count; foreach my $file (@files) { open (FILE,"$file"); while(<FILE>) { next unless /\b(?:GSM\|SMPP)\b/; # avoid uninteresting +lines chomp; my @ary = (split /\s\|\\|/, $_) [3,21,24]; my $time = (split ' ', $ary[0])[4]; # first element is +t timestamp, right? $SMPP_count{$time}++ if $ary[0] eq $stamp and $ary[1] + eq "Submit" and $ary[2] =~ /(GSM\|SMPP)/; }; } # now iterate over the keys of the hash to make up your sums for my $time ( sort keys %SMPP_count) { my $sum = $SMPP_count{$time}; ... } [download] perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l]
Re^6: using system command in regex by ravi45722 (Pilgrim) on Oct 15, 2015 at 10:41 UTC
What an Idea pragrammatic.... Nearly from three days struggling with this. Finally I solved it. Working great. Thank you very much. And I want to ask you one more thing. As already told I have three (Postpaid, Prepaid, Delivery) files for every 10 minutes. After calculating I am uploading all the values into DB like this. `Date Hour Mo_resp MT_resp AO_resp Percentage 10-08-2015 00:00 256 382 36 87% 10-08-2015 00:10 491 438 12 92%` [download] (Its a sample. Actually my DB contains 38 columns) Like this I am uploading all values for every 10 minutes. Now my requirement is to add all MO_resp all AO_resp ... and son on (all columns) which occurred in 00 hour to write an hourly report in excel sheet. For that I am Doing `use DBI; my $hour_db = DBI->connect("DBI:mysql:database=$db;host=$host;mysql_so +cket=/opt/lampstack-5.5.27-0/mysql/tmp/mysql.sock","root","", {'Raise +Error' => 1}); my @column_names = ("MO_resp","MT_resp","AO_resp"); foreach my $column_name (@column_names) { my $hour_sth = $hour_db->prepare("Select sum($column_name) from $t +able_name where Date='$db_date' and Hour like='$hour:%'"); $hour_sth->execute() or die $DBI::errstr; ..... }` [download] #Like this I am reading each column sum one by one. But I feel this is not a good method. Can you show me a way???	[reply] [d/l] [select]
Re^7: using system command in regex by marto (Cardinal) on Oct 15, 2015 at 10:47 UTC
Re^7: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 12:24 UTC
Re^8: using system command in regex by ravi45722 (Pilgrim) on Oct 16, 2015 at 03:59 UTC
Some notes below your chosen depth have not been shown here
Re^7: using system command in regex by soonix (Chancellor) on Oct 15, 2015 at 14:48 UTC
Re^8: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 15:33 UTC
Some notes below your chosen depth have not been shown here
Re^5: using system command in regex by shmem (Chancellor) on Oct 14, 2015 at 17:48 UTC
Thanks for sharing your knowledge shmem. Heh. My nick is programmatic. - Please provide sample input. What does your data look like? What are you trying to acomplish? what is your expected output? Just a sum? This whole thread is about an XY Problem it would seem. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply]
Re^6: using system command in regex by ravi45722 (Pilgrim) on Oct 15, 2015 at 04:08 UTC
shmem :-) Programmatic I already posted that in this thread. Please check it here 1144697	[reply]

Update: