in reply to Renaming html email dumps according to sender and date

Hi Sigmund,

Parsing html is always a pain and if it's a moving target, like you explain, it's even harder.

I would recommend considering an html parser. I've been using HTML::Tokeparser::Simple lately. It is very easy to use and maintain. You could quickly adapt to any changes

Your impressive list of regexes may also be vulnerable to changes and I would find that much more difficult to maintain.

Also, some of your regexes are decoding html entities. I use the imaginatively named HTML::Entities to that.

Could you post a snippet of the html?

Replies are listed 'Best First'.
Re^2: Renaming html email dumps according to sender and date
by Sigmund (Pilgrim) on Aug 09, 2004 at 09:30 UTC
    I could have easily posted a snippet, but it's a great mess and it's incredibly huge!
    btw, this is an example:

    Edit by tye, add READMORE

      Thanks. Sadly I'm at work at the moment and I have to clean a 4 unit web offset press : (

      I'll have a look later today.