in reply to Search for repeating but slightly different patterns

The best solution would be an algorithm that could find repeating chunks and return the differences they have
Uhm, either chunks repeat, or they are different. They cannot be both.
HTML stripping technique doesn't always help, because I have to have image urls and other html information on products
Have you tried parsing the pages instead of using regular expressions?
  • Comment on Re: Search for repeating but slightly different patterns