in reply to Silly Regex question...
If you go down the mis-guided path of trying to write regexes to parse HTML, eventually, you'll wind up with monstrosities like this:
Which I was trying to debug in this node.$data =~ s/ ( # Capture to $1 <a\s # <a and a space character (?: # Non-capturing parens [^>](?!href) # All non > not followe +d by href )* # zero or more of them href\s* # href followed by zero or +more space characters ) ( # Capture to $2 =\s* # = plus zero or more space +s ( # Capture to $3 &[^;]+; # some HTML character c +ode (probably " or ') )? # which might not exist (?: # Non-grouping parens .(?!\3) # any character not fol +lowed by $3 )+ # one or more of them (?: \3 # $3 )? # (which may not exist) ) ( # Capture to $4 [^>]+ # Everything up to final > > # Final > ) /$1 . decode_entities($2) . $4/gsexi;
Sorry for the bad news.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: (Ovid) Re: Silly Regex question...
by chipmunk (Parson) on Jan 11, 2001 at 07:59 UTC |