Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Better way of doing!

by kcott (Archbishop)
on Aug 12, 2013 at 10:44 UTC ( #1049101=note: print w/replies, xml ) Need Help??


in reply to Better way of doing!

G'day Xhings,

That's not a good way to write your code at all. There's no correlation between $i, ..., $o and the elements in @patterns that each is associated with. You have to write separate code for every single one of those. It scales exceptionally poorly: what happens if need to add another keyword? What you really want is a hash whose keys are the keywords and values are the count.

There's other problems as well in the while loop: you only count the first occurrence of the first keyword in each line; the smartmatch operator (~~) is experimental and unsuitable for production code.

Here's some skeleton code you can use as a basis for rewriting your script:

$ perl -Mstrict -Mwarnings -le ' my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url +submit}; my $pattern = join("|" => map { "(?<$_>\\b$_\\b)" } @keywords); my %count; for (@eachline) { while (/$pattern/g) { ++$count{(keys %+)[0]}; } } print "No. of matches for $_ is $count{$_}" for @keywords; ' No. of matches for transaction is 2 No. of matches for find is 2 No. of matches for think is 1 No. of matches for save_param is 1 No. of matches for start_sub is 1 No. of matches for url is 1 No. of matches for submit is 1

See perlre if you're unfamiliar with named capture groups, i.e. (?<name>pattern).

Update: Reviewing this code on the following day, I think named capture groups was probably overkill in this situation. Here's a simpler version using ordinary captures:

$ perl -Mstrict -Mwarnings -le ' my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url +submit}; my $pattern = "(" . join("|" => map { "\\b$_\\b" } @keywords) . ") +"; my %count; for (@eachline) { ++$count{$1} while /$pattern/g; } print "No. of matches for $_ is $count{$_}" for @keywords; ' No. of matches for transaction is 2 No. of matches for find is 2 No. of matches for think is 1 No. of matches for save_param is 1 No. of matches for start_sub is 1 No. of matches for url is 1 No. of matches for submit is 1

-- Ken

Replies are listed 'Best First'.
Re^2: Better way of doing!
by ramlight (Friar) on Aug 12, 2013 at 13:27 UTC
    If this seems a little hard to follow for someone who is new to Perl, here is a version that works very similarly to kcott's code. However, even for those who find his code hard to follow (or maybe especially those who find it hard to follow), it is worth the trouble to understand the way the pattern is built and the way it is used with named captures.
    use strict; use warnings; my @eachline = ( "transaction blah find blah blah think blah save_param", "start_sub blah url blah blah submit transaction blah find" ); my @keywords = qw{transaction find think save_param start_sub url subm +it}; my %count; for my $line (@eachline) { for my $pattern (@keywords) { $count{$pattern} += ($line =~ /$pattern/g); } } print "No. of matches for $_ is $count{$_}\n" for @keywords;
Re^2: Better way of doing!
by Anonymous Monk on Aug 12, 2013 at 11:01 UTC
    naming each word? thats obscene :)
Re^2: Better way of doing!
by Xhings (Acolyte) on Aug 21, 2013 at 09:26 UTC
    Thanks ken! I tested the above suggestion by you. Below is the whole code:
    @file = glob "$dir/MyAccount_OneTimePay_BaseLine.usr/Action.c"; my $row = 1; foreach $file (@file) { my $col = 0; open(my $FILEHANDLE, "<", $file) or die "cannot open file $file: $!"; my @eachline = <$FILEHANDLE>; my @keywords = qw{lr_start_transaction web_reg_find lr_think_time web_ +reg_save_param lr_start_sub_transaction web_url web_submit_data}; my $pattern = join("|" => map { "(?<$_>\\b$_\\b)" } @keywords); my %count; for (@eachline) { while (/$pattern/g) { ++$count{(keys %+)[0]}; } } $worksheet1-> write($row,$col,(split '/',$file)[-1],$format2); $col++; for (@keywords) { $worksheet1->write($row,$col, "$count{$_}", $format2); $col++; } close($FILEHANDLE); $row++; }
    Seems %count is not initialized and hence when no pattern is found my writing to XLS is impacted (columns are getting clubbed). I am using Spreadsheet::WriteExcel module. Let me know if this understanding is correct. Below is the message from the console:
    Use of uninitialized value within %count in string at Aldol.pl line 76 +, <$FILEHANDLE> line 550.
    Thanks Xhings!

      If you want to initialise all counts to zero, do this:

      my %count = map { $_ => 0 } @keywords;

      Also, see my update: that's probably a better choice of code for your application.

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1049101]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2022-10-01 13:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (126 votes). Check out past polls.

    Notices?