One flaw is that the regex does not capture multiple tokens that meet the pattern - the paren's below do that and the result is an array. This is called "match global" in Perl lingo.
Another problem is that the regex syntax to match 4 or more is not quite right. {4,} should be {4,}?. The first version would just match 4 at a minimum, but no more. That following ? does matter!
Also to split on "words", space separated tokens, I used the default "split". There are actually 2 different versions of this "default" split. One without parens and one with parens and they work slightly differently when dealing with the beginning of a line. Here, it makes no difference.
I also used a Perl "trick" that can embed comments within the code. This "trick" can also be used to generate documentation in web format. Here I just used it to put my output/comments into the compilable and runnable code. That way I don't have to send you 2 different files, one with code and one with output.
Oh, using the -w switch for a single program like this turns on warnings. The "use warnings;" is not necessary. This also works under Windows. Wow!
I always use strict; and use warnings;. There is a small performance hit for this. But it is almost always worth it. Keep doing that!
#!/usr/bin/perl -w use strict; while (<DATA>) { print "INPUT LINE: $_"; my @four_constants = grep{/([bBcCdDfFgGhHjJkKlLmMnNpPqQrRsStTvVwWxXzZ]{4,}?)/g} split; #the ? allows more than a min of 4! next unless @four_constants; print "output: @four_constants", "\n"; } =EXAMPLE OUTPUT INPUT LINE: xyy xyz INPUT LINE: bBbB output: bBbB INPUT LINE: abc bacx INPUT LINE: abca xyzz INPUT LINE: abCA XXZZ output: XXZZ INPUT LINE: xxyyzzz INPUT LINE: bckz klmx output: bckz klmx INPUT LINE: BKZXXXXXXXXXXXX output: BKZXXXXXXXXXXXX =cut __DATA__ xyy xyz bBbB abc bacx abca xyzz abCA XXZZ xxyyzzz bckz klmx BKZXXXXXXXXXXXX
In reply to Re: regular expressions
by Marshall
in thread regular expressions
by mbgbioinfo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |