in reply to Alternation vs. looping for multiple searches.
It depends whether you want to find all matches or if you're happy with finding any matches. With alternation, once you find one, then it stops looking, and won't find the rest. If you're satisfied with that, then by all means use it. If you need to find *each* match, then alternation could easily become troublesome:
Roboticus@Roboticus-PC ~ $ cat regex_alt.pl use strict; use warnings; my $text=<<EOT; Now is the time for all good men to come to the aid of their party. EOT print "ORDER MATTERS FOR COLLISIONS:\n"; while ($text=~/(the|their)/g) { print "\tfound $1 at $-[0], $+[0].\n"; } print "VS:\n"; while ($text=~/(their|the)/g) { print "\tfound $1 at $-[0], $+[0].\n"; } Roboticus@Roboticus-PC ~ $ perl regex_alt.pl ORDER MATTERS FOR COLLISIONS: found the at 7, 10. found the at 44, 47. found the at 55, 58. VS: found the at 7, 10. found the at 44, 47. found their at 55, 60.
Here, you see that if you're not careful in building your regex from a collection, that you can have problems. If you put 'the' before 'their' in your alternation, you'll never match 'their'. In an automated system, you can't simply go by string length, and put the shorter strings after the longer strings. For example, if one of your regex strings was "t[ho]e", then placing it before "the" would still prevent you from matching the second one.
Now all of this may or may not matter depending on your requirements. (After all, if "the" can match either expression, is it significant which one matched?) My point is simply that you'll need to think about things before simply building an alternation...
...roboticus
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Alternation vs. looping for multiple searches.
by kprice++ (Novice) on Nov 21, 2010 at 06:03 UTC |