The training of a monk

After years of silent dedication, the young monk's patience and dedication pays off after his skill is put to the ultimate test.

I've been studying regular expressions quite a bit to try and get the hang of them and have finally gotten to the point where I am truly grokking them. However, despite getting some minor tweaks and efficiencies in my code, I never have really had the need to write a truly complex regular expression. Until now. My training has paid off :)

I am routinely coming up against data like the following:

Std_English,A2,B3|Std_Arts,A2,B6|Std_Cultural,A1,E8
[download]

Whenever I encounter a variable containing values like "Cultural" or "English", I've needed to extract that A2,B6 type data from all of those sections delimited by a | or by the end of line. It's not terribly difficult, but I have found that the script in question was running terribly slowly and the regex I was using was inside of a loop that was doing its part in slowing things down. Thus, I was forced to write the most efficient regex I could for this. Here's the result:

$input =~ /^
            (?:                 # Non-capturing parens
               [^_]             # All non-underscores
               |                # or
               _                # underscore
               (?!              # not followed by
                 ${value}       # the current standards
               )
            )+                  # 1 or more characters like above
            _${value},          # Current standard followed by comma
            (                   # Capture to $1
              (?:               # Non-capturing parens
                 [A-Z]\d{1,2},? # One cap, one or two digits, and an o
+ptional comma
              )+                # Above one or more times
            )                   # End capture
            .*                  # Rest of line (okay to be greedy here
+)
         /ix;
[download]

Wow! That was a mouthful. Just a couple of months ago when I joined Perlmonks, I never would have dreamed of writing anything like that. Thanks to all of you for your helpfulness and patience.

Cheers,
Ovid

P.S.: To monks less familiar with regex, see my node Death to Dot Star! for an explanation of why the above is efficient.

Comment on The training of a monk Select or Download Code

Replies are listed 'Best First'.
RE: The training of a monk by coreolyn (Parson) on Aug 12, 2000 at 19:13 UTC
While the efficiency of the regex may be wonderful, I want to thank you for posting this node simply for the style through which you broke out the functionality of the expression. As I need to share a lot of code at work, I may adopt this style of expression building as it provides clarity and insight as to it's functionality (and might eleviate the number of verbal explanations I need to provide). coreolyn Duct tape devotee.	[reply]
RE: The training of a monk by Buckaroo Buddha (Scribe) on Aug 14, 2000 at 18:24 UTC
WOW! this is the best commenting of a regex i've seen never even thought of doing it down to that depth cool :) thnkas	[reply]