Re: Question why this Regex isn't matching

Replies are listed 'Best First'.
Re^2: Question why this Regex isn't matching by AnomalousMonk (Archbishop) on Sep 30, 2011 at 17:46 UTC
`$pieces[4] =~ m/>(^<+)</;` [download] This regex wants a '>' character followed by ^ (hat metacharacter), the start of the string! That's not likely to occur in any string unless the `/m` regex modifier is used to allow ^ to match with embedded newlines (Update: Actually, even that won't happen. The match would have to be with something like `/ > \n ^ /xm` because with the /m switch ^ will only match immediately after a newline or at the very start of the string). Did you perhaps mean something like `m/>([^<]+)</`?	[reply] [d/l] [select]
Re^3: Question why this Regex isn't matching by OfficeLinebacker (Chaplain) on Sep 30, 2011 at 17:53 UTC
YES! I thought that the rules were the same for () as []. Thanks for clearing that up. And yes, the latter is what I want because I want to group and capture that part of the match into $1. I like computer programming because it's like Legos for the mind.	[reply]
Re^4: Question why this Regex isn't matching by ww (Archbishop) on Sep 30, 2011 at 18:54 UTC
There are far better ways to achieve your goal than using regexen. Parsing HTML is notoriously fraught with difficulties; the more so, when that HTML is not compliant with well-known standards (4.10 strict; 4.01 loose in particular) That means rolling your own flies in the face of the caution against re-inventing wheels. To minimize you problems, take a look at the various modules built for the job. A search of CPAN (or ActiveState with ppm if you're on Windows and using AS's Perl) will present a wealth of well-tested and stable (reliable) options. HTML::Parser, HTML::TableParser, and HTML::Extract are just a few of the many that may suit your needs.	[reply]