Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Better way of doing!

by Xhings (Acolyte)
on Aug 12, 2013 at 09:40 UTC ( [id://1049091]=perlquestion: print w/replies, xml ) Need Help??

Xhings has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am trying to find keyword count from a file as below:
use strict; use warnings; my $i=0; my $j=0; my $k=0; my $l=0; my $m=0; my $n=0; my $o=0; open(my $FILEHANDLE, "<", "Action5.c") or die "cannot open < Action5.c +: $!"; my @eachline = <$FILEHANDLE>; my @patterns = (qr/transaction/, qr/find/, qr/think/, qr/save_param/, +qr/start_sub/, qr/url/, qr/submit/); my $count = @eachline; while(<@eachline>){ if ($_ ~~ $patterns[0]){$i++;} elsif($_ ~~ $patterns[1]){$j++;} elsif($_ ~~ $patterns[2]){$k++;} elsif($_ ~~ $patterns[3]){$l++;} elsif($_ ~~ $patterns[4]){$m++;} elsif($_ ~~ $patterns[5]){$n++;} elsif($_ ~~ $patterns[6]){$o++;} } print "Number of patterns found for".$patterns[0]." is ".$i."\n"; print "Number of patterns found for".$patterns[1]." is ".$j."\n"; print "Number of patterns found for".$patterns[2]." is ".$k."\n"; print "Number of patterns found for".$patterns[3]." is ".$l."\n"; print "Number of patterns found for".$patterns[4]." is ".$m."\n"; print "Number of patterns found for".$patterns[5]." is ".$n."\n"; print "Number of patterns found for".$patterns[6]." is ".$o."\n"; close($FILEHANDLE);
This is a very crude way,I know. Please guide me with a better code. Regards, Xhings

Replies are listed 'Best First'.
Re: Better way of doing!
by kcott (Archbishop) on Aug 12, 2013 at 10:44 UTC

    G'day Xhings,

    That's not a good way to write your code at all. There's no correlation between $i, ..., $o and the elements in @patterns that each is associated with. You have to write separate code for every single one of those. It scales exceptionally poorly: what happens if need to add another keyword? What you really want is a hash whose keys are the keywords and values are the count.

    There's other problems as well in the while loop: you only count the first occurrence of the first keyword in each line; the smartmatch operator (~~) is experimental and unsuitable for production code.

    Here's some skeleton code you can use as a basis for rewriting your script:

    $ perl -Mstrict -Mwarnings -le ' my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url +submit}; my $pattern = join("|" => map { "(?<$_>\\b$_\\b)" } @keywords); my %count; for (@eachline) { while (/$pattern/g) { ++$count{(keys %+)[0]}; } } print "No. of matches for $_ is $count{$_}" for @keywords; ' No. of matches for transaction is 2 No. of matches for find is 2 No. of matches for think is 1 No. of matches for save_param is 1 No. of matches for start_sub is 1 No. of matches for url is 1 No. of matches for submit is 1

    See perlre if you're unfamiliar with named capture groups, i.e. (?<name>pattern).

    Update: Reviewing this code on the following day, I think named capture groups was probably overkill in this situation. Here's a simpler version using ordinary captures:

    $ perl -Mstrict -Mwarnings -le ' my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url +submit}; my $pattern = "(" . join("|" => map { "\\b$_\\b" } @keywords) . ") +"; my %count; for (@eachline) { ++$count{$1} while /$pattern/g; } print "No. of matches for $_ is $count{$_}" for @keywords; ' No. of matches for transaction is 2 No. of matches for find is 2 No. of matches for think is 1 No. of matches for save_param is 1 No. of matches for start_sub is 1 No. of matches for url is 1 No. of matches for submit is 1

    -- Ken

      If this seems a little hard to follow for someone who is new to Perl, here is a version that works very similarly to kcott's code. However, even for those who find his code hard to follow (or maybe especially those who find it hard to follow), it is worth the trouble to understand the way the pattern is built and the way it is used with named captures.
      use strict; use warnings; my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url subm +it}; my %count; for my $line (@eachline) { for my $pattern (@keywords) { $count{$pattern} += ($line =~ /$pattern/g); } } print "No. of matches for $_ is $count{$_}\n" for @keywords;
      naming each word? thats obscene :)
      Thanks ken! I tested the above suggestion by you. Below is the whole code:
      @file = glob "$dir/MyAccount_OneTimePay_BaseLine.usr/Action.c"; my $row = 1; foreach $file (@file) { my $col = 0; open(my $FILEHANDLE, "<", $file) or die "cannot open file $file: $!"; my @eachline = <$FILEHANDLE>; my @keywords = qw{lr_start_transaction web_reg_find lr_think_time web_ +reg_save_param lr_start_sub_transaction web_url web_submit_data}; my $pattern = join("|" => map { "(?<$_>\\b$_\\b)" } @keywords); my %count; for (@eachline) { while (/$pattern/g) { ++$count{(keys %+)[0]}; } } $worksheet1-> write($row,$col,(split '/',$file)[-1],$format2); $col++; for (@keywords) { $worksheet1->write($row,$col, "$count{$_}", $format2); $col++; } close($FILEHANDLE); $row++; }
      Seems %count is not initialized and hence when no pattern is found my writing to XLS is impacted (columns are getting clubbed). I am using Spreadsheet::WriteExcel module. Let me know if this understanding is correct. Below is the message from the console:
      Use of uninitialized value within %count in string at Aldol.pl line 76 +, <$FILEHANDLE> line 550.
      Thanks Xhings!

        If you want to initialise all counts to zero, do this:

        my %count = map { $_ => 0 } @keywords;

        Also, see my update: that's probably a better choice of code for your application.

        -- Ken

Re: Better way of doing!
by jwkrahn (Abbot) on Aug 12, 2013 at 13:07 UTC
    while(<@eachline>){

    You are interpolating the array @eachline in a double quoted context and then using the glob function on the resulting string:

    $ perl -MO=Deparse -e'while(<@eachline>){}' use File::Glob (); while (defined($_ = glob(join($", @eachline)))) { (); } -e syntax OK

    this is a very obtuse way to split a string on whitespace:

    $ perl -le'my @eachline = ( "one two three four\n", "five six seven ei +ght\n" ); while(<@eachline>){print}' one two three four five six seven eight

    It looks like you may want something like this:

    use strict; use warnings; open my $FILEHANDLE, "<", "Action5.c" or die "cannot open < Action5.c: + $!"; my %patterns = ( transaction => 0, find => 0, think => 0, save_param => 0, start_sub => 0, url => 0, submit => 0, ); while ( my $line = <$FILEHANDLE> ) { for my $pattern ( keys %patterns ) { $patterns{ $pattern } += () = $line =~ /$pattern/g; } } for my $pattern ( keys %patterns ) { print "Number of patterns found for".$pattern." is ".$patterns{ $p +attern }."\n"; } close $FILEHANDLE;
Re: Better way of doing!
by Anonymous Monk on Aug 12, 2013 at 09:54 UTC
Re: Better way of doing!
by Laurent_R (Canon) on Aug 12, 2013 at 16:46 UTC

    What about something like this:

    use strict; use warnings; my (@count, @patterns); push @patterns, qr /$_/ for qw/transaction find think save_param start +_sub url submit/; # you could also write a slightly more Perlish: my @patterns = ma +p { qr /$_/ } qw/transaction find think save_param start_sub url subm +it/; my $max = $#patterns; open my $FILEHANDLE, "<", "Action5.c" or die "cannot open < Action5.c: + $!"; while (<$FILEHANDLE>) { for my $i (0..$max) { $count[$i]++ if $_ ~~ $patterns[$i]; } } for my $i (0..$max) { print "Number of patterns found for $patterns[$i] is $count[$i] \ +n"; } close $FILEHANDLE;

    This is untested, as you did not provide any data test file. A real data file might be useful also to clarify some of my questions below.

    Note that I did not change the "if $_ ~~ $patterns..." part in the most inner loop because I don't know what you are exactly trying to achieve with the smart match operator, but I would be tempted to rewrite the line as:

    $count[$i]++ if /$patterns[$i]/;

    I also slightly changed your logic, because it seemed flawed to me: if your current data line matches, say, the first pattern, your program does not test the other patterns, whereas the same line could very well match two or more different patterns and should probably counted for the other patterns. If you really want to stop testing the patterns as soon as you've got one match, then you need to change my most inner loop to this:

    $count[$i]++ and last if $_ ~~ $patterns[$i];

    or

    $count[$i]++ and last if /$patterns[$i]/;

    I would advise to read very carefully each line of my suggested program, it is very very different from yours and you might learn a few things from it if you take the time to try to understand everything. And don't hesitate to ask if there is something you don't understand, my fellow brothers and myself will be happy to explain.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1049091]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 19:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found