How to replace greedy alternation?

adamcrussell has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to replace greedy alternation? by ikegami (Patriarch) on Jan 28, 2009 at 04:32 UTC
It can be done with one regexp. `/^(?=.a)(?=.b)(?=.c).d/ or die;` [download] If you want to know the order in which they came, `my %order; @order{qw( a b c d )} = map length, /^(?=(.?)a)(?=(.?)b)(?=(.?)c)(.?)d/ or die; my @ordered = sort { $order{$a} <=> $order{$b} } keys %order;` [download] This is probably slow for long strings. Update: Actually, no need for a hash. `my @order = map length, /^(?=(.?)a)(?=(.?)b)(?=(.?)c)(.?)d/ or die; my @ordered = ( qw( a b c d ) )[ sort { $order[$a] <=> $order[$b] } 0..$#order ];` [download] Update: Back to the practical, using multiple regexp and saving `@-` should be much faster. Especially if scanning for constants. `my @unordered = qw( a b c d ); my @order; for my $s (@unordered) { push @order, /\Q$s/ ? $-[0] : die; } my @ordered = @unordered[ sort { $order[$a] <=> $order[$b] } 0..$#order ];` [download]	[reply] [d/l] [select]
Re^2: How to replace greedy alternation? by jwkrahn (Abbot) on Jan 28, 2009 at 06:52 UTC
Another way to do it: `my %order; /(?=.(a))(?=.(b))(?=.*(c))/ and @order{ qw/a b c/ } = @-[ 1 .. $#- ] +;` [download]	[reply] [d/l]
Re^2: How to replace greedy alternation? by Anonymous Monk on Jan 28, 2009 at 07:17 UTC
This code does not work: `/^(?=.a)(?=.b)(?=.c).d/` [download] For example, when used in the code below nothing s returned. `echo bacd\| perl -ne '/^(?=.a)(?=.b)(?=.c).d/; print "$1 $2 $3\n";'` [download]	[reply] [d/l] [select]
Re^3: How to replace greedy alternation? by jwkrahn (Abbot) on Jan 28, 2009 at 07:25 UTC
That is because there are no capturing parentheses in your regular expression and so `$1`, `$2` and `$3` are not affected and thus contain nothing.	[reply] [d/l] [select]
Re: How to replace greedy alternation? by BrowserUk (Patriarch) on Jan 28, 2009 at 04:25 UTC
`>echo 1b22a3d3ccccccccccccccc \| perl -nle"/a/&&/b/&&/c/&&/d/ and print 'Matched: ', pack '(a1)4', m[([ +abcd])]g" Matched: badc` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: How to replace greedy alternation? by kyle (Abbot) on Jan 28, 2009 at 04:27 UTC
This is what I came up with. `my $rx = qr/ ([abcd]) # a, b, c, or d -- stored in \1 .* # well, whatever (?!\1) # the next character can't be \1 ([abcd]) # a, b, c, or d -- stored in \2 .* # y'know, stuff (?!\1\|\2) # next thing not \1 or \2 ([abcd]) # a, b, c, or d -- stored in \3 .* (?!\1\|\2\|\3) # next thing not \1, \2, or \3 [abcd] # you get the idea /x;` [download] I have a feeling there's still an easier way, but that's not too bad. Here is some testing. Read more... (438 Bytes) Update: See below, ikegami has the better way.	[reply] [d/l] [select]
Re: How to replace greedy alternation? by repellent (Priest) on Jan 28, 2009 at 05:29 UTC
Another way to find out the order: `my $test_string="1b22a3d3c"; my @ordered; push(@ordered, $1) while $test_string =~ /([abcd])/cg; pos($test_string) = 0; # reset m//cg offset` [download] Update: Used capture as per ikegami's reply. Update #2: This may be better: `my %seen; my @order = grep { !$seen{$_}++ } $test_string =~ /([abcd])/g;` [download]	[reply] [d/l] [select]
Re^2: How to replace greedy alternation? by ikegami (Patriarch) on Jan 28, 2009 at 05:35 UTC
`$&` imposes a needless slowdown on all regexp in your process that aren't already slowed down by captures. Use a capture instead. Your solution assumes none of the four patterns will match twice.	[reply] [d/l]
Re^2: How to replace greedy alternation? by Anonymous Monk on Jan 28, 2009 at 06:07 UTC
TMTOW to find the order: `my $test = "1b22a3d3cccccccba866.afgg"; (my $ordered = $test) =~ tr/abcd//cd; print $ordered, "\n"; # badcccccccbaa` [download]	[reply] [d/l]
Re^3: How to replace greedy alternation? by gone2015 (Deacon) on Jan 28, 2009 at 11:55 UTC
Indeed, getting rid of the clutter first, and then remove all but the first instance of each letter: `sub reduc { my ($s) = @_ ; $s =~ tr/abcd//cd ; 1 while $s =~ s/(.).*?\K\1+//g ; return $s ; } ; print reduc('1b22a7b3d3ccaacccbcccacccddcceqcc11oabepd'), "\n" ; # badc print reduc('1b22w7b3d3ccsqcccbccc2cccddcceqcc11o9bepd'), "\n" ; # bdc` [download] I imagine a more cunning monk can improve the reduction step -- though the amount of work it does obviously depends on how many repeated letters need to be removed.	[reply] [d/l]
Re^4: How to replace greedy alternation? by Anonymous Monk on Jan 29, 2009 at 06:37 UTC
Re^5: How to replace greedy alternation? by gone2015 (Deacon) on Jan 29, 2009 at 13:26 UTC
Re: How to replace greedy alternation? by Marshall (Canon) on Jan 28, 2009 at 08:11 UTC
I'll offer yet another solution.... There is a requirement that each of the letters much appear at least once. And I interpret "order" to mean the order that they appeared in the string, not in frequency or order of the list that we are looking for...maybe that is wrong. The match global will just go through the input string once. The while() loop iterates over a list that the match global expression produced. I'm not sure how "smart" regex is with /o option (compile regex once). Having a join may be slower than I think, but my testing so far with Perl 5.10 says that regex is way smarter (and faster) than it used to be! You can do the join other ways or declare other variables. One main point about regex is that picking from a set `[xyz]` is a lot faster than /x/\|/y/\|/z\|. Map and sort are pretty "heavy weight" critters. A hash table especially with a small number of keys is very efficient. A Perl hash gets created by default with 8 hash "buckets". Don't be afraid to use a hash. This code is verbose, but is straightforward and will run quickly. `#!/usr/bin/perl -w use strict; my $str = "1b22a3d3cab"; my @letters = qw (a b 2 c d); my @order_found =(); my %seen =(); #UPDATE: don't need the $found variable, oops! #while (my $found = $str =~ m/([join("",@letters)])/go) #should have been: while ($str =~ m/([join("",@letters)])/go) { push (@order_found, $1) unless $seen{$1}; $seen{$1}++; } if (keys %seen != @letters) { print "not all letters found\n"; } else { print @order_found, "\n"; # prints b2adc }` [download]	[reply] [d/l] [select]
Re: How to replace greedy alternation? by jink (Scribe) on Jan 28, 2009 at 12:36 UTC
It's been ages since I last posted here, but here goes. Your first statement is false. This egrep doesn't make sure that all characters occur, so it makes sense that it seems to be harder in perl. Egrep tries to match either of the characters you specified, and spits out only the parts that match, because of the -o: `echo 1b22a3d3 \| egrep -o 'a\|b\|c\|d' b a d` [download] All Camels are equal, but some Camels are more equal than others.	[reply] [d/l]
Re^2: How to replace greedy alternation? by adamcrussell (Hermit) on Jan 28, 2009 at 16:05 UTC
egrep finds all the cases because it has greedy alternation(the subject of this Seekers...). -o simply has egrep print out the matches it does find. A complete non-perl solution could be something like `echo 1b22cc333a \| egrep -o 'a\|b\|c' \| sort -u \| wc -l` [download] This will simply return 3. If I want 3 chars in any order and this command returns 3 than I had a successful match of all 3 in any order!	[reply] [d/l]