in reply to Alternation vs. looping for multiple searches.

kprice:

It depends whether you want to find all matches or if you're happy with finding any matches. With alternation, once you find one, then it stops looking, and won't find the rest. If you're satisfied with that, then by all means use it. If you need to find *each* match, then alternation could easily become troublesome:

Roboticus@Roboticus-PC ~ $ cat regex_alt.pl use strict; use warnings; my $text=<<EOT; Now is the time for all good men to come to the aid of their party. EOT print "ORDER MATTERS FOR COLLISIONS:\n"; while ($text=~/(the|their)/g) { print "\tfound $1 at $-[0], $+[0].\n"; } print "VS:\n"; while ($text=~/(their|the)/g) { print "\tfound $1 at $-[0], $+[0].\n"; } Roboticus@Roboticus-PC ~ $ perl regex_alt.pl ORDER MATTERS FOR COLLISIONS: found the at 7, 10. found the at 44, 47. found the at 55, 58. VS: found the at 7, 10. found the at 44, 47. found their at 55, 60.

Here, you see that if you're not careful in building your regex from a collection, that you can have problems. If you put 'the' before 'their' in your alternation, you'll never match 'their'. In an automated system, you can't simply go by string length, and put the shorter strings after the longer strings. For example, if one of your regex strings was "t[ho]e", then placing it before "the" would still prevent you from matching the second one.

Now all of this may or may not matter depending on your requirements. (After all, if "the" can match either expression, is it significant which one matched?) My point is simply that you'll need to think about things before simply building an alternation...

...roboticus

Replies are listed 'Best First'.
Re^2: Alternation vs. looping for multiple searches.
by kprice++ (Novice) on Nov 21, 2010 at 06:03 UTC

    Sorry, I should have been much more specific. The problem is to find a set of regex's that would mark blocks of text. So it doesn't matter how many are on a line, or where. Once one regex is matched, all lines following, up to and including the next match, are considered within the block of code. The lines within blocks are to be returned to standard output.

    I have two working scripts, one that uses alternation, and one that uses a loop. The way my teacher is, it's usually apparent what he expects that doesn't make it into the spec, and it seems to me, he intends for there to be a loop. But, I don't see the problem with using just regex. Would it be slower than having a loop that searches one regex at a time and can stop once it finds one? Is that even a relevant question?

    106 ^L[18:27:43 kelly@kudzu lab6]$ cat 4.pl 107 #! /usr/bin/perl 108 # 109 # CIS 33A Programming In Perl 110 # Lab #6.4 111 # Due Date: Monday, Nov 22 112 # Written By: Kelly Price 113 # 114 use strict; 115 use warnings; 116 117 die "Usage: $0 infile outfile regexs...\n" if @ARGV < 3; 118 open(FIN, "$ARGV[0]") or die "Error openning $ARGV[0] for readin +g: $!\n"; 119 open(FOUT, ">$ARGV[1]") or die "Error openning $ARGV[1] for writin +g: $!\n"; 120 121 my (@regexs) = splice(@ARGV, 2); 122 my $rexs = '(' . join('|', @regexs) . ')'; 123 124 my $inblock = 0; 125 print FOUT grep { 126 my $match = 0; 127 if (/$rexs/) 128 { 129 $inblock = $inblock ? 0 : 1; 130 $match = 1; 131 } 132 $inblock or $match; 133 } <FIN>; 134 135 [18:27:46 kelly@kudzu lab6]$ 4.pl 4.pl 4.out \\{ \\} 136 [18:28:00 kelly@kudzu lab6]$ cat 4.out 137 print FOUT grep { 138 my $match = 0; 139 if (/$rexs/) 140 { 141 } 142 $inblock or $match; 143 } <FIN>;