I've been studying regular expressions quite a bit to try and get the hang of them and have finally gotten to the point where I am truly grokking them. However, despite getting some minor tweaks and efficiencies in my code, I never have really had the need to write a truly complex regular expression. Until now. My training has paid off :)
I am routinely coming up against data like the following:
Whenever I encounter a variable containing values like "Cultural" or "English", I've needed to extract that A2,B6 type data from all of those sections delimited by a | or by the end of line. It's not terribly difficult, but I have found that the script in question was running terribly slowly and the regex I was using was inside of a loop that was doing its part in slowing things down. Thus, I was forced to write the most efficient regex I could for this. Here's the result:Std_English,A2,B3|Std_Arts,A2,B6|Std_Cultural,A1,E8
Wow! That was a mouthful. Just a couple of months ago when I joined Perlmonks, I never would have dreamed of writing anything like that. Thanks to all of you for your helpfulness and patience.$input =~ /^ (?: # Non-capturing parens [^_] # All non-underscores | # or _ # underscore (?! # not followed by ${value} # the current standards ) )+ # 1 or more characters like above _${value}, # Current standard followed by comma ( # Capture to $1 (?: # Non-capturing parens [A-Z]\d{1,2},? # One cap, one or two digits, and an o +ptional comma )+ # Above one or more times ) # End capture .* # Rest of line (okay to be greedy here +) /ix;
Cheers,
Ovid
P.S.: To monks less familiar with regex, see my node Death to Dot Star! for an explanation of why the above is efficient.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: The training of a monk
by coreolyn (Parson) on Aug 12, 2000 at 19:13 UTC | |
|
RE: The training of a monk
by Buckaroo Buddha (Scribe) on Aug 14, 2000 at 18:24 UTC |