in reply to Re: Search for repeating but slightly different patterns
in thread Search for repeating but slightly different patterns

Having done this kind of thing a few times, let me suggest that IMHO, this is a case where regex is better than parsing. Ignoring sites continually changing their formats, the main problem I found was that there were a lot of web-sites using broken html, as in it wouldn't validate and thus parsing was problematic.

This is one of those cases where, in theory, parsing is better, but it has to be very forgiving parsing, to the point that I found it unreliable. Disheartening.

  • Comment on Re^2: Search for repeating but slightly different patterns

Replies are listed 'Best First'.
Re^3: Search for repeating but slightly different patterns
by Sly_G (Novice) on Oct 27, 2010 at 16:31 UTC
    I'm with you on this. If all sites were made under strict html standard, web-world would be a much better place. So, regexp it is!