A Quick Regex Question

chinamox has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: A Quick Regex Question by GrandFather (Saint) on Oct 08, 2006 at 07:58 UTC
One way to do it is to use tr to count the characters of interest in each word: `use strict; use warnings; my @words = qw(Perl purile pretty reputation reputable); for (@words) { print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//; }` [download] Prints: `Matched Perl Matched purile Matched reputable` [download] Note that you have to test seperately for each letter and ensure that at least one of each is present. That is why there are four different tests - one for each letter. Note too that if you want a case insensitive match both upper and lower case versions of the letter need to be present. DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: A Quick Regex Question by eyepopslikeamosquito (Archbishop) on Oct 08, 2006 at 09:00 UTC
I like Gramp's solution: simple, elegant and efficient. Too often people automatically go for a regex when the good old `tr` operator is a better choice. Due to my unfortunate golfing past, I can't restrain myself from noting that, if you are feeling in a silly mood, you could replace his simple and clear line: `print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//;` [download] with this silly one: `y&pP&&&&y&eE&&&&y&rR&&&&y&lL&&&&print"Matched $_\n";` [download] Deparse confirming their equivalence: `# cat sensible.pl print "Matched $_\n" if tr/pP// && tr/eE// && tr/rR// && tr/lL//; # perl -MO=Deparse sensible.pl print "Matched $_\n" if tr/Pp// and tr/Ee// and tr/Rr// and tr/Ll//; sensible.pl syntax OK # cat silly.pl y&pP&&&&y&eE&&&&y&rR&&&&y&lL&&&&print"Matched $_\n"; # perl -MO=Deparse silly.pl print "Matched $_\n" if tr/Pp// and tr/Ee// and tr/Rr// and tr/Ll//; silly.pl syntax OK` [download]	[reply] [d/l] [select]
Re^3: A Quick Regex Question by GrandFather (Saint) on Oct 08, 2006 at 09:19 UTC
Actually compared to the translate (counting) technique, for a case sensitive match McDarren's solution using the equivelent regex technique is slightly faster, but it's slightly slower if using a case insensitive match. Case sensitivity makes no significant difference to the translate. For the sake of code clarity I'd actually go with the multiple regex solution, but offered the translate solution because the technique gets forgotten about somewhat. Read more... benchmark stuff (1339 Bytes) DWIM is Perl's answer to Gödel	[reply] [d/l]
Re^2: A Quick Regex Question by chinamox (Scribe) on Oct 09, 2006 at 09:01 UTC
Smiles... I found && in perldoc yesterday and thought it might be useful. I will certainly make use of it in the very near future. Thank you! -mox	[reply]
Re: A Quick Regex Question by McDarren (Abbot) on Oct 08, 2006 at 08:01 UTC
You know, you could simply do: `while (<>) { next if !/p/; next if !/e/; next if !/r/; next if !/l/; print; }` [download] or (if you don't mind using unless)... `while (<>) { next unless /p/; next unless /e/; next unless /r/; next unless /l/; print; }` [download]	[reply] [d/l] [select]
Re: A Quick Regex Question by Zaxo (Archbishop) on Oct 08, 2006 at 08:00 UTC
To match a sequence of "p"'s, "e"'s, "r"'s and "l"'s, you can use a character class, `while (<>) { print if m/[perl]+/; }` [download] You are capturing the last matched character with those parens. Is that what you mean to do? If you want to capture the whole string, the parens should contain the '+'. After Compline, Zaxo	[reply] [d/l]
Re^2: A Quick Regex Question by Nkuvu (Priest) on Oct 09, 2006 at 21:46 UTC
That was my first thought also, but I'm not positive that the original poster meant this. Instead of p or e or r or l, I thought that the requirement was p and e and r and l (in any order). So I would expect that "orange" wouldn't be a valid match, given that it doesn't contain 'p' and 'l'. The given character class does, of course, match "orange" just fine.	[reply]
Re: A Quick Regex Question by jbert (Priest) on Oct 08, 2006 at 17:55 UTC
Firstly, the "all a's" or "all b's" is due to the (a\|b) evaluating to an a, or to a b then subsequently being repeated by the +. A better way to achieve that goal is with a 'character class', where you'd write: `/[ab]+/` Regarding matching p.e.r.l, the various solutions so far lack 'data driven'-ness, in that they embed the sought-after characters in code. An approach which allows the sought-after chars to be specified would be: `my @required_chars = qw/p e r l/; # Or anything else... LINE: while (<>) { foreach my $char (@required_chars) { next LINE unless index($_, $char) >= 0; } print; }` [download] A more insane (and presumably less efficient) way of doing it would be to compile the N sought after characters into a regexp which would match each of the N! ways these chars might occur in the string. #!perl -w # Warning - silly way of solving problem use strict; # CPAN, how we love thee use Math::Combinatorics; my @required_chars = qw/p e r l/; # Or anything else... # For bonus points, we could create a package which blessed # the regexp into an object. But that would be silly. my $re = make_re(@required_chars); while (<>) { print if /$re/; } sub make_re { # Get all N! ways of arranging the chars my @combinations = permute(@_); # Construct a.b..d strings, for each my @res = map { join(".*", @$_); } @combinations; # Put them together into a honking great regexp my $re = "(" . join("\|", @res) . ")"; return qr/$re/; } [download]	[reply] [d/l] [select]
Re: A Quick Regex Question by marto (Cardinal) on Oct 08, 2006 at 09:23 UTC
chinamox, Here is some advice, read the Copyright section of the source you have linked to. It would seem that they don't want you to reproduce any of it, by any means. Martin	[reply]
Re^2: A Quick Regex Question by chargrill (Parson) on Oct 08, 2006 at 15:44 UTC
chinamox said at the top of their post: " Brothers I fear this novice may have been duped ". While you raise a good point that it's important to check copyright of material before reposting, I would say in this case it's probably OK. Granted, I'm not an intellectual property lawyer, but taking a look at the linked copyright page: No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. (Emphasis added) I'd say that someone thinking they were duped by code from the electronic version of the book qualifies as a critical article. --chargrill `s*lil; $=join'',sort split q; s;.;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$,$/)` [download]	[reply] [d/l]
Re^3: A Quick Regex Question by chinamox (Scribe) on Oct 09, 2006 at 09:11 UTC
Thank you chargrill, That was my general viewpoint. I did not try to pass it off as my own work and even linked to the original source. Thanks for keeping me honest Martin, in a place whith $0.50 DVDs for sale on the sidewalks, one could easly slip...	[reply]
Re: A Quick Regex Question by sgifford (Prior) on Oct 09, 2006 at 04:22 UTC
Here's a way to do it in a regular expression. Note that there's no particular reason to use this instead of tr, except that it's kind of fun, and it will work as part of a larger pattern (for example to find any matching words within a paragraph). :-) `/([perl]).(?!\1)([perl]).(?!\1)(?!\2)([perl]).(?!\1)(?!\2)(?!\3)([p +erl])/` [download] This makes use of zero-width negative look-ahead assertions* (see perlre(1)) to say essentially "any letters from this group, except for the ones you've already seen". So the first `[perl]` will match any of those four letters; the second time, it's preceded by an assertion that the character cannot be the character that matched the first time; and so forth. Here's a variation that will pull out any words containing those four letters from a line of text: `while (<>) { my $i=0; print join(" ", grep { (($i++) % 5) == 0 } /\b(([perl])\w(?!\2)([perl])\w(?!\2)(?!\3)([perl])\w(?!\2)(?!\3 +)(?!\4)([perl]))\b/g),"\n"; }` [download] -- sgifford's Web page*	[reply] [d/l] [select]
Re: A Quick Regex Question by jdporter (Paladin) on Oct 09, 2006 at 14:04 UTC
`while (<>) { print if lc(join '',sort split//) =~ /e.l.p.*r/; }` [download] We're building the house of the future together.	[reply] [d/l]