Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

One way is almost what you have, but with some changes,

use strict; my (@keywords, %keyword)=qw/foo bar 12345 abcd/; my ($string, %result) = "foobarfoo1234523423412345abcdefsadfabc";
Precompile the regexen, @keyword{@keywords} =  map {qr/\Q$_\E/} @keywords; Now get the count directly without any named temporary,
$result{$_} = () = $string =~ $keyword{$_} for (keys %keyword);
That's no big change over what you have, but uses some idiomatic optimizations.

Another way is to count within the big regex you mention. You can do that with a code construction in the re,

my @regexen = map { qr/(?:\Q$_\E(?{$result{$_}++}))/ } @keywords; my $re = do { local $" = '|'; qr/@regexen/; };
I've used my favorite tricky way of getting alternation into an array there (qr// is an interpolating quote operator).
$string =~ /$re/g; print "$_: $result{$_}\n" for @keywords;
There, the regex engine should only evaluate the code part if the text has matched, and then restart the regex at pos for the next match. Untested,

Update: Perl qr// dosn't seem to like running the (?{$result{$_}++}) bit. I'm not sure why. Anybody know?

A third way is to munch through the string with index for each word you want to match.

It may be worthwhile to study your text before running the regex matches on it. Benchmark your different approaches, chances are that each will be best for some cases.

After Compline,
Zaxo


In reply to Re: Count multiple pattern matches by Zaxo
in thread Count multiple pattern matches by johnnywang

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-04-18 10:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found