Re: How to replace greedy alternation?
by ikegami (Patriarch) on Jan 28, 2009 at 04:32 UTC
|
It can be done with one regexp.
/^(?=.*a)(?=.*b)(?=.*c).*d/
or die;
If you want to know the order in which they came,
my %order;
@order{qw( a b c d )} =
map length, /^(?=(.*?)a)(?=(.*?)b)(?=(.*?)c)(.*?)d/
or die;
my @ordered = sort { $order{$a} <=> $order{$b} } keys %order;
This is probably slow for long strings.
Update: Actually, no need for a hash.
my @order = map length, /^(?=(.*?)a)(?=(.*?)b)(?=(.*?)c)(.*?)d/
or die;
my @ordered = ( qw( a b c d ) )[
sort { $order[$a] <=> $order[$b] } 0..$#order
];
Update: Back to the practical, using multiple regexp and saving @- should be much faster. Especially if scanning for constants.
my @unordered = qw( a b c d );
my @order;
for my $s (@unordered) {
push @order, /\Q$s/ ? $-[0] : die;
}
my @ordered = @unordered[
sort { $order[$a] <=> $order[$b] } 0..$#order
];
| [reply] [d/l] [select] |
|
|
my %order;
/(?=.*(a))(?=.*(b))(?=.*(c))/ and @order{ qw/a b c/ } = @-[ 1 .. $#- ]
+;
| [reply] [d/l] |
|
|
/^(?=.*a)(?=.*b)(?=.*c).*d/
For example, when used in the code below nothing s returned.
echo bacd| perl -ne '/^(?=.*a)(?=.*b)(?=.*c).*d/; print "$1 $2 $3\n";'
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
Re: How to replace greedy alternation?
by BrowserUk (Patriarch) on Jan 28, 2009 at 04:25 UTC
|
>echo 1b22a3d3ccccccccccccccc |
perl -nle"/a/&&/b/&&/c/&&/d/ and print 'Matched: ', pack '(a1)4', m[([
+abcd])]g"
Matched: badc
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: How to replace greedy alternation?
by kyle (Abbot) on Jan 28, 2009 at 04:27 UTC
|
my $rx = qr/
([abcd]) # a, b, c, or d -- stored in \1
.* # well, whatever
(?!\1) # the next character can't be \1
([abcd]) # a, b, c, or d -- stored in \2
.* # y'know, stuff
(?!\1|\2) # next thing not \1 or \2
([abcd]) # a, b, c, or d -- stored in \3
.*
(?!\1|\2|\3) # next thing not \1, \2, or \3
[abcd] # you get the idea
/x;
I have a feeling there's still an easier way, but that's not too bad. Here is some testing.
Update: See below, ikegami has the better way. | [reply] [d/l] [select] |
Re: How to replace greedy alternation?
by repellent (Priest) on Jan 28, 2009 at 05:29 UTC
|
Another way to find out the order:
my $test_string="1b22a3d3c";
my @ordered;
push(@ordered, $1) while $test_string =~ /([abcd])/cg;
pos($test_string) = 0; # reset m//cg offset
Update: Used capture as per ikegami's reply.
Update #2: This may be better:
my %seen;
my @order = grep { !$seen{$_}++ } $test_string =~ /([abcd])/g;
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
my $test = "1b22a3d3cccccccba866.afgg";
(my $ordered = $test) =~ tr/abcd//cd;
print $ordered, "\n"; # badcccccccbaa
| [reply] [d/l] |
|
|
sub reduc {
my ($s) = @_ ;
$s =~ tr/abcd//cd ;
1 while $s =~ s/(.).*?\K\1+//g ;
return $s ;
} ;
print reduc('1b22a7b3d3ccaacccbcccacccddcceqcc11oabepd'), "\n" ;
# badc
print reduc('1b22w7b3d3ccsqcccbccc2cccddcceqcc11o9bepd'), "\n" ;
# bdc
I imagine a more cunning monk can improve the reduction step -- though the amount of work it does obviously depends on how many repeated letters need to be removed. | [reply] [d/l] |
|
|
|
|
Re: How to replace greedy alternation?
by Marshall (Canon) on Jan 28, 2009 at 08:11 UTC
|
I'll offer yet another solution.... There is a requirement that each of the letters much appear at least once. And I interpret "order" to mean the order that they appeared in the string, not in frequency or order of the list that we are looking for...maybe that is wrong.
The match global will just go through the input string once. The while() loop iterates over a list that the match global expression produced. I'm not sure how "smart" regex is with /o option (compile regex once). Having a join may be slower than I think, but my testing so far with Perl 5.10 says that regex is way smarter (and faster) than it used to be! You can do the join other ways or declare other variables. One main point about regex is that picking from a set [xyz] is a lot faster than /x/|/y/|/z|.
Map and sort are pretty "heavy weight" critters. A hash table especially with a small number of keys is very efficient. A Perl hash gets created by default with 8 hash "buckets". Don't be afraid to use a hash. This code is verbose, but is straightforward and will run quickly.
#!/usr/bin/perl -w
use strict;
my $str = "1b22a3d3cab";
my @letters = qw (a b 2 c d);
my @order_found =();
my %seen =();
#UPDATE: don't need the $found variable, oops!
#while (my $found = $str =~ m/([join("",@letters)])/go)
#should have been:
while ($str =~ m/([join("",@letters)])/go)
{
push (@order_found, $1) unless $seen{$1};
$seen{$1}++;
}
if (keys %seen != @letters)
{
print "not all letters found\n";
}
else
{
print @order_found, "\n"; # prints b2adc
}
| [reply] [d/l] [select] |
Re: How to replace greedy alternation?
by jink (Scribe) on Jan 28, 2009 at 12:36 UTC
|
It's been ages since I last posted here, but here goes.
Your first statement is false. This egrep doesn't make sure that all characters occur, so it makes sense that it seems to be harder in perl. Egrep tries to match either of the characters you specified, and spits out only the parts that match, because of the -o:
echo 1b22a3d3 | egrep -o 'a|b|c|d'
b
a
d
All Camels are equal, but some Camels are more equal than others.
| [reply] [d/l] |
|
|
egrep finds all the cases because it has greedy alternation(the subject of this Seekers...).
-o simply has egrep print out the matches it does find.
A complete non-perl solution could be something like
echo 1b22cc333a | egrep -o 'a|b|c' | sort -u | wc -l
This will simply return 3. If I want 3 chars in any order and this command returns 3 than I had a successful match of all 3 in any order! | [reply] [d/l] |