Making glacial progress... I've fixed the failing tests, and as for when to use HTML::ListScraper, the principal use case is parsing search engine results. But documenting a worked-out example would IMHO be misleading - the module just
doesn't work well enough for lots of people to start using it right now...
HTML::ListScraper is different from HTML::TokeParser and HTML::TreeBuilder in that it doesn't return the same information (for the same input document); it drops the "irregular" parts, leaving something smaller and hopefully easier to interpret - except that as it stands, it drops rather too much...
Recently I've been reminded that biologists have an interest in sequence matching, and some interesting
algorithms I could try, but they don't seem implemented as CPAN modules, so the next step looks like implementing that before trying to incorporate some form of sequence alignment into HTML::ListScraper (a bit like Algorithm::AhoCorasick,
which turned out to be completely unnecessary :-) ). And obviously the algorithms will have variations and alternatives I've no idea about - any bioinformatics specialists around here?