using system command in regex

ravi45722 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: using system command in regex by AppleFritter (Vicar) on Oct 13, 2015 at 10:06 UTC
It's a common mistake to assume that `chomp` takes a value, chomps it, and returns the result, but this isn't the case; `chomp` operates on the expression you feed it. In particular, that expression has to be an lvalue. You'll have to save your command's output in a variable to chomp it, which I think will also make your code more readable. :) You can chomp and assign in the same step, e.g.: foreach my $min (20 .. 29) { foreach my $sec (@seconds) { chomp(my $res = `cut -d "\|" -f 1,10,13 SMSCDR$date$hour$minut +e.log \|grep "Submit\|SMPP" \|grep "$hour:$min:$sec" \|sort \|uniq -c`); my $SMPP_count = (split /\s+/, $res)[0]; print $SMPP_count; } } [download] Though for the sake of readability I'd still suggest separating this a bit further and not doing everything in the same step (the command you're invoking is complicated enough, after all). Meanwhile, this: `syntax error at second.pl line 24, near ")["` [download] is because you've got the parentheses in the wrong place. You're doing `split(..., ...)[...]`, trying to treat `split` as if it were a list, but it isn't. However, if you put what `split` returns into a list, it works: `(split ..., ...)[...]`. I've fixed that above. Finally, you don't need a character class in your regexp you feed to `split`, though depending on what exactly your command returns, you may want to split on any (positive) amount of whitespace (`\s+`), not just one whitespace character (`\s`). I've changed that above as well.	[reply] [d/l] [select]
Re^2: using system command in regex by ravi45722 (Pilgrim) on Oct 13, 2015 at 11:15 UTC
Thanks for reply. Now I know how chomp working in my above program. I write (extended) my program in another way. Here as you said I used (\s+) instead of "\s". my $SMPP_count = int ((split (/\s+/,`cut -d "\|" -f 1,10,13 SMSCDR$date$hour$minute.log \|grep "Submit\|GSM" \|grep "$hour:$min:$sec" \|sort \|uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "\|" -f 1,10,13 SMSCDR$date$hour$minute.log \|grep "Submit\|SMPP" \|grep "$hour:$min:$sec" \|sort \|uniq -c`)) [1]); Here the array returning by split contains "NULL" in the 0th position. If you write "(split ..., ...)[0]" its giving the result like Use of uninitialized value in int at second.pl line 34. 06:20:00= 0 0 Argument "" isn't numeric in int at second.pl line 34. Argument "" isn't numeric in int at second.pl line 34. 06:20:01= 0 0 Argument "" isn't numeric in int at second.pl line 34. Argument "" isn't numeric in int at second.pl line 34. 06:20:02= 0 0 Argument "" isn't numeric in int at second.pl line 34. Argument "" isn't numeric in int at second.pl line 34. 06:20:03= 0 0 Argument "" isn't numeric in int at second.pl line 34. Argument "" isn't numeric in int at second.pl line 34. 06:20:04= 0 0 Argument "" isn't numeric in int at second.pl line 34. Argument "" isn't numeric in int at second.pl line 34. 06:20:05= 0 0 [download] Why the first element in the array is "NULL"????	[reply] [d/l] [select]
Re^3: using system command in regex by shmem (Chancellor) on Oct 13, 2015 at 12:34 UTC
my $SMPP_count = int ((split (/\s+/,`cut -d "\|" -f 1,10,13 SMSCDR$ +date$hour$minute.log \|grep "Submit\|GSM" \|grep "$hour:$min:$sec" \|sor +t \|uniq -c`)) [1]) + int ((split (/\s+/,`cut -d "\|" -f 1,10,13 SMSCDR +$date$hour$minute.log \|grep "Submit\|SMPP" \|grep "$hour:$min:$sec" \| +sort \|uniq -c`)) [1]); [download] Short answer: since you are using `uniq -c` as the last filter in you pipelines, you are interested in the first field. This field has leading whitespace. From the documentation of split: As another special case, "split" emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as `' '` or "\x20", but not e.g. "`/ /`"). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were "/\s+/"; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator. Long answer: you are running 1 x perl 2 x /bin/sh (at each `qx()` or backtick ``) 4 x grep 2 x sort 2 x uniq which makes for 11 processes in total, and you are reading each file that matches `SMSCDR$date$hour$minute.log` twice - to get a sum which perl happily would give you in a less convoluted way in just 1 process. `cut -d "\|" -f 1,10,13` would be `(split '\|', $_)[0,9,12]` grep and sort are perl builtins use a hash (see perldata) and use its keys for uniqueness you can use the bultin glob to expand `SMSCDR$date$hour$minute.log` into a list of filenames From your code I am guessing that your log files contain a timestamp in the first field, `Submit` occurs in the 10^th field, and you want lines which contain `GSM`or `SMPP` in the 13^th field. Putting it all together, omitting uneccesary steps and not writing perl as if it were shell: `@ARGV = glob "SMSCDR$date$hour$minute.log"; my $SMPP_count; my $stamp = "$hour:$min:$sec"; while(<>){ chomp; my @ary = (split '\|')[0,9,12]; $SMPP_count++ if $ary[0] eq $stamp and $ary[1] eq "Submit" and $ary[2] =~ /(GSM\|SMPP)/; }; print $SMPP_count;` [download] update: corrected code perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l] [select]
Re^4: using system command in regex by ravi45722 (Pilgrim) on Oct 14, 2015 at 05:46 UTC
Re^5: using system command in regex by shmem (Chancellor) on Oct 15, 2015 at 07:51 UTC
Some notes below your chosen depth have not been shown here
Re^5: using system command in regex by shmem (Chancellor) on Oct 14, 2015 at 17:48 UTC
Some notes below your chosen depth have not been shown here
Re^3: using system command in regex by AppleFritter (Vicar) on Oct 13, 2015 at 12:07 UTC
Don't do everything in one huge unreadable line; break it up, and take a look at what the intermediate steps produce, and I'm sure the problem will become much clearer. FWIW if you're processing log files using cut, grep, sort, uniq etc., you can probably also read and process them in your Perl script instead. Doing that would also be more robust, more readable/maintainable, and (if log formats change) more future-proof.	[reply]
Re^4: using system command in regex by ravi45722 (Pilgrim) on Oct 13, 2015 at 13:09 UTC