Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

For this matching, is there only one user per line? If a user is found on a line, is it then not ever found again? I've written samples for you which answer those questions differently. There may be ways to optimize each of these for your specific problem. It just depends on which problem you are solving.

There is a bug in your original code - you said keys %$users and then dereferenced the key directly like $user->{'Pattern'}. $user is a plain string so that is a symbolic reference. Using strict would have caught that bug for you. You meant to write $users->{ $user }{ 'Pattern' } which properly looks up the value named $user in the hash reference $users.

There is a potential bug depending on your data. The string "aa" matches "a" and "aa". If you ask for only the first match, then the more complete, perhaps more correct match will not be attempted if you stop. You may need to adjust your logic to account for the length of the match to see which pattern matched "better". None of my examples correct for this.

Each line may match multiple users and once found, are not looked for anymore. This may be be the fastest because it can reduce the search space by multiples with each iteration.

# Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; my %unmatched_users; @unmatched_users{ keys %$users } = (); while ( my $line = <> ) { ... my @users = grep $line =~ $users->{$_}{'CompiledPattern'}, keys %unmatched_users; if ( @users ) { warn "Great, we found " . join( ', ', map $_->{'Pattern'}, @{$users}{ @users } ) . " user(s)!\n"; delete @unmatched_users{ @users }; } else { warn "$line didn't match any users.\n"; } }

Each line may match one user. Once a user is found, it is not looked for anymore. This may be be the fastest because it reduces the search space with each successful match and if any match is found, stops looking for any more.

use List::Util 'first'; # Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; my %unmatched_users; @unmatched_users{ keys %$users } = (); while ( my $line = <> ) { ... my $user = first { $line =~ $users->{$_}{'CompiledPattern'} } keys %unmatched_users; if ( defined $user ) { warn "Great, we found pattern $user->{'Pattern'}!\n"; delete $unmatched_users{ $user }; } else { warn "$line didn't match any users.\n"; } }

Each line may match *one* user but users may be found on multiple lines. The search space remains constant.

# Precompile all the patterns and store them into the key CompiledPatt +ern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; while ( my $line = <> ) { ... my $user = first { $line =~ $users->{$_}{'CompiledPattern'} } keys %$users; if ( $user ) { warn "Great, we found pattern $user->{'Pattern'}!\n"; } else { warn "$line didn't match any users.\n"; } }

Each line may match multiple users and users may be found on multiple lines. This is the worst case sample you already had.

# Precompile all the patterns and store them into the key # CompiledPattern $_->{'CompiledPattern'} = qr/$_->{'Pattern'}/i for values %$users; while ( my $line = <> ) { ... my @users = grep $line =~ $users->{$_}{'CompiledPattern'}, keys %$users; if ( @users ) { warn "Great, we found " . join( ', ', map $_->{'Pattern'}, @{$users}{ @users } ) . " user(s)!\n"; } else { warn "$line didn't match any users.\n"; } }

In reply to Re: Matching against list of patterns by diotalevi
in thread Matching against list of patterns by Eyck

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-18 13:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found