Re^4: using system command in regex

Thanks for sharing your knowledge shmem. I replaced the code you wrote (with some needed modifications) in my program. But Its really dead slow.

the code took:289 wallclock secs ( 6.43 usr 0.72 sys + 388.95 cusr 58.95 csys = 455.05 CPU)

This result is for my program (using grep & cut). But after modified in this 300 wallclock secs its runs the data only for 36 seconds. Then to complete the total 10 minutes data it will take nearly .... (Actually I dont know :-) ). My modified code is

my $greatest = 0;
my $total    = 0;
my @files    = glob "SMSCDR*$date$hour$minute*.log";

foreach my $min ($minute .. $minute+9)
{

        foreach my $sec (@seconds)
        {
#               my $SMPP_count = int ((split (/\s+/,`cut -d "|" -f 1,1
+0,13 SMSCDR*$date$hour$minute*.log |grep "Submit|GSM" |grep "$hour:$m
+in:$sec" |sort |uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "|" -f 1
+,10,13 SMSCDR*$date$hour$minute*.log |grep "Submit|SMPP" |grep "$hour
+:$min:$sec" |sort |uniq -c`)) [1]);

                my $SMPP_count;
                my $stamp = "$hour:$min:$sec";
                foreach my $file (@files)
                {
                        open (FILE,"$file");
                        while(<FILE>)
                        {
                                chomp;
                                my @ary = (split /\s|\|/, $_) [3,21,24
+];
                                $SMPP_count++ if  $ary[0] eq $stamp an
+d $ary[1] eq "Submit" and $ary[2] =~ /(GSM|SMPP)/;
                        };
                }
                if ($SMPP_count > $greatest)
                {
                        $greatest = $SMPP_count;
                }
                $total = $total + $SMPP_count;
                print "$hour:$min:$sec","= $SMPP_count","\t",$total,$/
+;
        }
}
print $greatest,$/;

my $t1 = Benchmark->new;
my $td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";
[download]

Note: The result is same for both the programs. As you said the reading of files has to be faster than grep & cut. But here its not working like that. Where I am missing I don't understand.

Update:

Here when you use glob it will return three files. In that post_paid contains 8278 lines, prepaid contains 23072 lines, delivery_file contains 80097 lines. If you are calculating for first second it have to check "1,11,447" lines. Like that for 10 minutes (600 seconds) the program had to check "6,68,68,200" lines. Please show me a way to get rid of this
Comment on Re^4: using system command in regex
Select or Download Code

Replies are listed 'Best First'.
Re^5: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 07:51 UTC
Of course it is dead slow. You are opening, reading and closing each file for *10 60 = 600 times** to get the sum for each second. You should read each file once and store the sums in a hash, keyed by the timestamp, like so: my %SMPP_count; foreach my $file (@files) { open (FILE,"$file"); while(<FILE>) { next unless /\b(?:GSM\|SMPP)\b/; # avoid uninteresting +lines chomp; my @ary = (split /\s\|\\|/, $_) [3,21,24]; my $time = (split ' ', $ary[0])[4]; # first element is +t timestamp, right? $SMPP_count{$time}++ if $ary[0] eq $stamp and $ary[1] + eq "Submit" and $ary[2] =~ /(GSM\|SMPP)/; }; } # now iterate over the keys of the hash to make up your sums for my $time ( sort keys %SMPP_count) { my $sum = $SMPP_count{$time}; ... } [download] perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l]
Re^6: using system command in regex by ravi45722 (Pilgrim) on Oct 15, 2015 at 10:41 UTC
What an Idea pragrammatic.... Nearly from three days struggling with this. Finally I solved it. Working great. Thank you very much. And I want to ask you one more thing. As already told I have three (Postpaid, Prepaid, Delivery) files for every 10 minutes. After calculating I am uploading all the values into DB like this. `Date Hour Mo_resp MT_resp AO_resp Percentage 10-08-2015 00:00 256 382 36 87% 10-08-2015 00:10 491 438 12 92%` [download] (Its a sample. Actually my DB contains 38 columns) Like this I am uploading all values for every 10 minutes. Now my requirement is to add all MO_resp all AO_resp ... and son on (all columns) which occurred in 00 hour to write an hourly report in excel sheet. For that I am Doing `use DBI; my $hour_db = DBI->connect("DBI:mysql:database=$db;host=$host;mysql_so +cket=/opt/lampstack-5.5.27-0/mysql/tmp/mysql.sock","root","", {'Raise +Error' => 1}); my @column_names = ("MO_resp","MT_resp","AO_resp"); foreach my $column_name (@column_names) { my $hour_sth = $hour_db->prepare("Select sum($column_name) from $t +able_name where Date='$db_date' and Hour like='$hour:%'"); $hour_sth->execute() or die $DBI::errstr; ..... }` [download] #Like this I am reading each column sum one by one. But I feel this is not a good method. Can you show me a way???	[reply] [d/l] [select]
Re^7: using system command in regex by marto (Cardinal) on Oct 15, 2015 at 10:47 UTC
As a side note, please be aware of SQL_Injection, you have had this pointed out to you a couple of times (Re^4: Problem passing date to SQL, Re^3: After parsing .xls the rows getting emerged). The DBI documentation has a section on using placeholders, the links you've been given previously have further examples.	[reply]
Re^7: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 12:24 UTC
First, I have to second marto in that you should always use placeholders, third (since second is already used) you could aggregate all your columns into one call: `my @column_names = ("MO_resp","MT_resp","AO_resp"); my $sql = "select ".join ",", map { "sum($_)" } @column_names; $sql .= " from ? where Date = ? and Hour like ?"; my $hour_sth = $hour_db->prepare( $sql ); $hour_sth->execute($table_name, $db_date, "$hour:%") or die $DBI::errs +tr;` [download] See join and map. If you are iterating over $db_date and $hour, you should keep the call to `$hour_db->prepare` outside those loops to reduce the overhead of binding placeholders in the SQL statement. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l] [select]
Re^8: using system command in regex by ravi45722 (Pilgrim) on Oct 16, 2015 at 03:59 UTC
Re^9: using system command in regex by shmem (Chancellor) on Oct 16, 2015 at 07:27 UTC
Some notes below your chosen depth have not been shown here
Re^7: using system command in regex by soonix (Chancellor) on Oct 15, 2015 at 14:48 UTC
plus you seem to have misunderstood the first sentence of this. "programmatic" is a quality of his nick, see e.g. there :-)	[reply]
Re^8: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 15:33 UTC
Re^9: using system command in regex by soonix (Chancellor) on Oct 15, 2015 at 19:28 UTC
Re^5: using system command in regex by shmem (Chancellor) on Oct 14, 2015 at 17:48 UTC
Thanks for sharing your knowledge shmem. Heh. My nick is programmatic. - Please provide sample input. What does your data look like? What are you trying to acomplish? what is your expected output? Just a sum? This whole thread is about an XY Problem it would seem. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply]
Re^6: using system command in regex by ravi45722 (Pilgrim) on Oct 15, 2015 at 04:08 UTC
shmem :-) Programmatic I already posted that in this thread. Please check it here 1144697	[reply]