Actually, I'm glad you brought this up. In 5.8.4, there's improved ability (thanks to me) to create your own Unicode classes, and even build cascading ones. The documentation is in perlunicode, and here's an example (you must have Perl 5.8.4 for this to work):
package MyUnicode; sub InLetters { return << 'END'; 0041 005a 0061 007a END } sub InVowels { return << 'END'; 0041 0045 0049 004f 0055 0061 0065 0069 006f 0075 END } sub InConsonants { return << 'END'; +MyUnicode::InLetters -MyUnicode::InVowels END } package main; my $string = "Chicken Stromboli"; while ($string =~ /(\p{MyUnicode::InConsonants}+)/g) { print "consonant cluster: '$1'\n"; } __END__ consonant cluster: 'Ch' consonant cluster: 'ck' consonant cluster: 'n' consonant cluster: 'Str' consonant cluster: 'mb' consonant cluster: 'l'
I could write about that, and explain the new '&' class operand, which allows you to do the intersection of two or more Unicode classes.

I like this idea. Maybe I can do this and one other topic -- I don't want the article to be too widely scoped.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

In reply to Re: Re: japhy's regex article for the TPJ by japhy
in thread japhy's regex article for the TPJ by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.