Re: regular expressions

This is actually pretty good. But...

One flaw is that the regex does not capture multiple tokens that meet the pattern - the paren's below do that and the result is an array. This is called "match global" in Perl lingo.

Another problem is that the regex syntax to match 4 or more is not quite right. {4,} should be {4,}?. The first version would just match 4 at a minimum, but no more. That following ? does matter!

Also to split on "words", space separated tokens, I used the default "split". There are actually 2 different versions of this "default" split. One without parens and one with parens and they work slightly differently when dealing with the beginning of a line. Here, it makes no difference.

I also used a Perl "trick" that can embed comments within the code. This "trick" can also be used to generate documentation in web format. Here I just used it to put my output/comments into the compilable and runnable code. That way I don't have to send you 2 different files, one with code and one with output.

Oh, using the -w switch for a single program like this turns on warnings. The "use warnings;" is not necessary. This also works under Windows. Wow!

I always use strict; and use warnings;. There is a small performance hit for this. But it is almost always worth it. Keep doing that!

#!/usr/bin/perl -w
use strict;

while (<DATA>)
{
   print "INPUT LINE: $_";

   my @four_constants = 
      grep{/([bBcCdDfFgGhHjJkKlLmMnNpPqQrRsStTvVwWxXzZ]{4,}?)/g}
      split; #the ? allows more than a min of 4!
   next unless @four_constants;
   print "output: @four_constants", "\n";
}

=EXAMPLE OUTPUT

INPUT LINE: xyy xyz
INPUT LINE: bBbB
output: bBbB
INPUT LINE: abc bacx
INPUT LINE: abca    xyzz
INPUT LINE: abCA    XXZZ
output: XXZZ
INPUT LINE: xxyyzzz
INPUT LINE: bckz  klmx
output: bckz klmx
INPUT LINE: BKZXXXXXXXXXXXX
output: BKZXXXXXXXXXXXX


=cut

__DATA__
xyy xyz
bBbB
abc bacx
abca    xyzz
abCA    XXZZ
xxyyzzz
bckz  klmx
BKZXXXXXXXXXXXX
[download]

Comment on Re: regular expressions Select or Download Code

Replies are listed 'Best First'.
Re^2: regular expressions by AnomalousMonk (Archbishop) on Jun 07, 2015 at 16:45 UTC
... the regex syntax to match 4 or more is not quite right. {4,} should be {4,}?. The first version would just match 4 at a minimum, but no more. The quantifier `{4,}` will match as much as possible (while still allowing an overall match), but at least four of the quantified atom. The quantifier `(4,}?` will match as little as necessary for an overall match, but at least four of the quantified atom. c:\@Work\Perl\monks>perl -wMstrict -le "my @strings = qw(vw vwx vwxz vwxzp vwxzpd vwxzpdq); ;; my $consonant = qr{ [bBcCdDfFgGhHjJkKlLmMnNpPqQrRsStTvVwWxXzZ] }xms; ;; for my $s (@strings) { print qq{'$s'}; print qq{{4,} matched; captured '$1'} if $s =~ m{ ($consonant{4,} +) }xms; print qq{{4,}? matched; captured '$1'} if $s =~ m{ ($consonant{4,}? +) }xms; print ''; } " 'vw' 'vwx' 'vwxz' {4,} matched; captured 'vwxz' {4,}? matched; captured 'vwxz' 'vwxzp' {4,} matched; captured 'vwxzp' {4,}? matched; captured 'vwxz' 'vwxzpd' {4,} matched; captured 'vwxzpd' {4,}? matched; captured 'vwxz' 'vwxzpdq' {4,} matched; captured 'vwxzpdq' {4,}? matched; captured 'vwxz' [download] See perlre, perlretut, and perlrequick. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: regular expressions
by AnomalousMonk (Archbishop) on Jun 07, 2015 at 16:45 UTC

... the regex syntax to match 4 or more is not quite right. {4,} should be {4,}?. The first version would just match 4 at a minimum, but no more.

The quantifier {4,} will match as much as possible (while still allowing an overall match), but at least four of the quantified atom. The quantifier (4,}? will match as little as necessary for an overall match, but at least four of the quantified atom.

c:\@Work\Perl\monks>perl -wMstrict -le
"my @strings = qw(vw vwx vwxz vwxzp vwxzpd vwxzpdq);
 ;;
 my $consonant = qr{ [bBcCdDfFgGhHjJkKlLmMnNpPqQrRsStTvVwWxXzZ] }xms;
 ;;
 for my $s (@strings) {
   print qq{'$s'};
   print qq{{4,} matched; captured '$1'}  if $s =~ m{ ($consonant{4,} 
+) }xms;
   print qq{{4,}? matched; captured '$1'} if $s =~ m{ ($consonant{4,}?
+) }xms;
   print '';
   }
"
'vw'

'vwx'

'vwxz'
{4,} matched; captured 'vwxz'
{4,}? matched; captured 'vwxz'

'vwxzp'
{4,} matched; captured 'vwxzp'
{4,}? matched; captured 'vwxz'

'vwxzpd'
{4,} matched; captured 'vwxzpd'
{4,}? matched; captured 'vwxz'

'vwxzpdq'
{4,} matched; captured 'vwxzpdq'
{4,}? matched; captured 'vwxz'
[download]

perlre

perlretut

perlrequick

Give a man a fish: <%-(-(-(-<

[reply]
[d/l]
[select]