Note that the regex is complicated enough that I've even indented the comments to help some poor programmer behind me maintain it. As it turns out, it still has two very subtle problems (which are irrelevant to this discussion) which arise only under rare circumstances. How would you even find those problems? Heck, if I were really evil, I could put the regex on one line and make the task virtually impossible for the average programmer:$data =~ s/ ( # Capture to $1 <a\s # <a and a space charact +er (?: # Non-capturing parens [^>](?!href) # All non > not foll +owed by href )* # zero or more of th +em .? href\s* # href followed by zero +or more space characters ) ( # Capture to $2 &\#61;\s* # = plus zero or more sp +aces ( # Capture to $3 &[^;]+; # some HTML character co +de (probably " or ') )? # which might not exist (?: # Non-grouping parens .(?!\3) # any character not foll +owed by $3 )+ # one or more of them .? (?: \3 # $3 )? # (which may not exist) ) ( # Capture to $4 [^>]+ # Everything up to final + > > # Final > ) /$1 . decode_entities($2) . $4/gsexi;
When I made the original post, tilly pointed out right away that he wouldn't use a regex to solve the problem (gasp!). That got me to thinking: since I love regex, I tend to employ them a lot. They're fast (if properly written), but many programmers don't grok them. Heck, even some of my simpler regexes are complicated:$data =~ s/(<a\s(?:[^>](?!href))*.?href\s*)(&\#61;\s*(&[^;]+;)?(?:.(?! +\3))+.?(?:\3)?)([^>]+>)/$1.decode_entities($2).$4/gsei;
That one just guarantees that a user-entered number fits my format. Aack!$number =~ /((?:[\d]{1,6}\.[\d]{0,5})|(?:[\d]{0,5}\.[\d]{1,6})|(?:[\d] +{1,7}))/;
tilly's comment, however, got me to thinking: how do Perlmonks create maintainable regexes, or do they avoid them in favor of more obvious solutions? I pride myself on writing clear, maintainable code with tons of comments. My beloved regexes, however, are the fly in my ointment of clarity. How do YOU deal with this?
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just go the the link and check out our stats.
In reply to Regexes vs. Maintainability by Ovid
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |