in reply to RE: RE: A grammar for HTML matching
in thread A grammar for HTML matching
<cite>What do you want to match?</cite>
Well, certainly not too complicated things. The point is probably to express the possible relationships between tags (e.g., contained in, preceding, following), and not the tags in themselves. Obviously this is not too trivial because of all these nifty exceptions that are allowed in HTML. Maybe it would be good to divide the parsing phase into a "candidate recognition" phase (purely regex based) and a "HTML parsing" phase, where you would expand the snippet found to canonical HTML syntax.
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
XML::XPath, anyone?
by merlyn (Sage) on Nov 02, 2000 at 19:00 UTC |