Re: A Quick Regex Question
by GrandFather (Saint) on Oct 08, 2006 at 07:58 UTC
|
use strict;
use warnings;
my @words = qw(Perl purile pretty reputation reputable);
for (@words) {
print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//;
}
Prints:
Matched Perl
Matched purile
Matched reputable
Note that you have to test seperately for each letter and ensure that at least one of each is present. That is why there are four different tests - one for each letter.
Note too that if you want a case insensitive match both upper and lower case versions of the letter need to be present.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
|
I like Gramp's solution: simple, elegant and efficient.
Too often people automatically go for a regex when the good old
tr operator is a better choice.
Due to my unfortunate golfing past, I can't restrain myself from
noting that, if you are feeling in a silly mood, you could
replace his simple and clear line:
print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//;
with this silly one:
y&pP&&&&y&eE&&&&y&rR&&&&y&lL&&&&print"Matched $_\n";
Deparse confirming their equivalence:
# cat sensible.pl
print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//;
# perl -MO=Deparse sensible.pl
print "Matched $_\n" if tr/Pp// and tr/Ee// and tr/Rr// and tr/Ll//;
sensible.pl syntax OK
# cat silly.pl
y&pP&&&&y&eE&&&&y&rR&&&&y&lL&&&&print"Matched $_\n";
# perl -MO=Deparse silly.pl
print "Matched $_\n" if tr/Pp// and tr/Ee// and tr/Rr// and tr/Ll//;
silly.pl syntax OK
| [reply] [d/l] [select] |
|
|
Actually compared to the translate (counting) technique, for a case sensitive match McDarren's solution using the equivelent regex technique is slightly faster, but it's slightly slower if using a case insensitive match. Case sensitivity makes no significant difference to the translate.
For the sake of code clarity I'd actually go with the multiple regex solution, but offered the translate solution because the technique gets forgotten about somewhat.
DWIM is Perl's answer to Gödel
| [reply] [d/l] |
|
|
| [reply] |
Re: A Quick Regex Question
by McDarren (Abbot) on Oct 08, 2006 at 08:01 UTC
|
You know, you could simply do:
while (<>) {
next if !/p/;
next if !/e/;
next if !/r/;
next if !/l/;
print;
}
or (if you don't mind using unless)...
while (<>) {
next unless /p/;
next unless /e/;
next unless /r/;
next unless /l/;
print;
}
| [reply] [d/l] [select] |
Re: A Quick Regex Question
by Zaxo (Archbishop) on Oct 08, 2006 at 08:00 UTC
|
To match a sequence of "p"'s, "e"'s, "r"'s and "l"'s, you can use a character class,
while (<>) {
print if m/[perl]+/;
}
You are capturing the last matched character with those parens. Is that what you mean to do? If you want to capture the whole string, the parens should contain the '+'.
| [reply] [d/l] |
|
|
That was my first thought also, but I'm not positive that the original poster meant this.
Instead of p or e or r or l, I thought that the requirement was p and e and r and l (in any order). So I would expect that "orange" wouldn't be a valid match, given that it doesn't contain 'p' and 'l'. The given character class does, of course, match "orange" just fine.
| [reply] |
Re: A Quick Regex Question
by jbert (Priest) on Oct 08, 2006 at 17:55 UTC
|
Firstly, the "all a's" or "all b's" is due to the (a|b) evaluating to an a, or to a b then subsequently being repeated by the +.
A better way to achieve that goal is with a 'character class', where you'd write: /[ab]+/
Regarding matching p.*e.*r.*l, the various solutions so far lack 'data driven'-ness, in that they embed the sought-after characters in code.
An approach which allows the sought-after chars to be specified would be:
my @required_chars = qw/p e r l/; # Or anything else...
LINE:
while (<>) {
foreach my $char (@required_chars) {
next LINE unless index($_, $char) >= 0;
}
print;
}
A more insane (and presumably less efficient) way of doing it would be to compile the N sought after characters into a regexp which would match each of the N! ways these chars might occur in the string.
#!perl -w
# Warning - silly way of solving problem
use strict;
# CPAN, how we love thee
use Math::Combinatorics;
my @required_chars = qw/p e r l/; # Or anything else...
# For bonus points, we could create a package which blessed
# the regexp into an object. But that would be silly.
my $re = make_re(@required_chars);
while (<>) {
print if /$re/;
}
sub make_re
{
# Get all N! ways of arranging the chars
my @combinations = permute(@_);
# Construct a.*b.*.*d strings, for each
my @res = map { join(".*", @$_); } @combinations;
# Put them together into a honking great regexp
my $re = "(" . join("|", @res) . ")";
return qr/$re/;
}
| [reply] [d/l] [select] |
Re: A Quick Regex Question
by marto (Cardinal) on Oct 08, 2006 at 09:23 UTC
|
chinamox,
Here is some advice, read the Copyright section of the source you have linked to. It would seem that they don't want you to reproduce any of it, by any means.
Martin | [reply] |
|
|
chinamox said at the top of their post: " Brothers I fear this novice may have been duped ".
While you raise a good point that it's important to check copyright of material before reposting, I would say in this case it's probably OK. Granted, I'm not an intellectual property lawyer, but taking a look at the linked copyright page:
No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews.
(Emphasis added)
I'd say that someone thinking they were duped by code from the electronic version of the book qualifies as a critical article.
--chargrill
s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; =
qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
| [reply] [d/l] |
|
|
Thank you chargrill, That was my general viewpoint. I did not try to pass it off as my own work and even linked to the original source.
Thanks for keeping me honest Martin, in a place whith $0.50 DVDs for sale on the sidewalks, one could easly slip...
| [reply] |
Re: A Quick Regex Question
by sgifford (Prior) on Oct 09, 2006 at 04:22 UTC
|
Here's a way to do it in a regular expression. Note that there's no particular reason to use this instead of tr, except that it's kind of fun, and it will work as part of a larger pattern (for example to find any matching words within a paragraph). :-)
/([perl]).*(?!\1)([perl]).*(?!\1)(?!\2)([perl]).*(?!\1)(?!\2)(?!\3)([p
+erl])/
This makes use of zero-width negative look-ahead assertions (see perlre(1)) to say essentially "any letters from this group, except for the ones you've already seen". So the first [perl] will match any of those four letters; the second time, it's preceded by an assertion that the character cannot be the character that matched the first time; and so forth.
Here's a variation that will pull out any words containing those four letters from a line of text:
while (<>)
{
my $i=0;
print join(" ",
grep { (($i++) % 5) == 0 }
/\b(([perl])\w*(?!\2)([perl])\w*(?!\2)(?!\3)([perl])\w*(?!\2)(?!\3
+)(?!\4)([perl]))\b/g),"\n";
}
| [reply] [d/l] [select] |
Re: A Quick Regex Question
by jdporter (Paladin) on Oct 09, 2006 at 14:04 UTC
|
while (<>) {
print if lc(join '',sort split//) =~ /e.*l.*p.*r/;
}
We're building the house of the future together.
| [reply] [d/l] |