in reply to Re: RFC: HTML::ListScraper
in thread RFC: HTML::ListScraper
HTML::ListScraper is different from HTML::TokeParser and HTML::TreeBuilder in that it doesn't return the same information (for the same input document); it drops the "irregular" parts, leaving something smaller and hopefully easier to interpret - except that as it stands, it drops rather too much...
Recently I've been reminded that biologists have an interest in sequence matching, and some interesting algorithms I could try, but they don't seem implemented as CPAN modules, so the next step looks like implementing that before trying to incorporate some form of sequence alignment into HTML::ListScraper (a bit like Algorithm::AhoCorasick, which turned out to be completely unnecessary :-) ). And obviously the algorithms will have variations and alternatives I've no idea about - any bioinformatics specialists around here?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: RFC: HTML::ListScraper
by Anonymous Monk on Jun 22, 2007 at 07:23 UTC |