in reply to Renaming html email dumps according to sender and date
Parsing html is always a pain and if it's a moving target, like you explain, it's even harder.
I would recommend considering an html parser. I've been using HTML::Tokeparser::Simple lately. It is very easy to use and maintain. You could quickly adapt to any changes
Your impressive list of regexes may also be vulnerable to changes and I would find that much more difficult to maintain.
Also, some of your regexes are decoding html entities. I use the imaginatively named HTML::Entities to that.
Could you post a snippet of the html?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Renaming html email dumps according to sender and date
by Sigmund (Pilgrim) on Aug 09, 2004 at 09:30 UTC | |
by wfsp (Abbot) on Aug 09, 2004 at 10:06 UTC |