in reply to extract text from mbx

If itīs html, use Html::Parser or one of itīs children. If itīs well-formed use one of the many xml-parsers like Xml::Parser.


holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: extract text from mbx
by Anonymous Monk on Feb 23, 2005 at 06:42 UTC
    Thanx Holli

    I have few informations to add. The file is not entirely an html file. It has junk characters( which i think represnts the header of each mail), then it is followed by the data. The data is enclosed in html tags.

    The single mbx file is the collection of many such mails.

      I don't think you are giving us enough information here. What is the software that is creating these 'mbx' files? Is it possible to give us an example that might help us determine the format?. If it is a standard mail storage format then it is possible that one of the Email::Folder or it's friends might be able to deal with it, you might even be able to identify the type of the mailbox with the Email::FolderType module.

      /J\