First off, there are good reasons why Pegasus Mail uses its own format. Pegasus Mail's folders support some things that would not be possible with mbox format and are in some ways more robust. These days a mail directory format (e.g., nnml) would probably be better, but at the time the pmail folder format was designed, that would have been impractical for a number of reasons.
There is not a large overlap between Perl users and pmail users, but there may be a few. I have used Pegasus Mail myself in the past, before I learned Gnus, and might be interested in collaborating with you on creating a parser for pmail folders.
For just parsing, you can completely ignore the .pmi files; those are indices. (They might improve performance, but it's probably not worth figuring out a second file format just for that.) If you wanted to *write* pmail folders, then you would either have to update the indices, or delete them, the former being much preferable since deleting them would cause pmail to have to regenerate them next time it starts, which would cause a quite user-noticeable wait. But for just parsing, which sounds like a good initial goal, this is not a consideration.
The good news about the format is that most of the binary-encoded information is stuff you probably don't need, if you aren't trying to clone Pegasus Mail. Labels and flags and annotations and things. The bodies are stored *mostly* unaltered, though of course there are some provisions to prevent any specific character's presense in a message from causing problems, so messages with characters outside the printable ASCII range are encoded in some fashion (if they weren't already for transport over SMTP, though they ought to have been, in theory).
I have a number of pmail folders myself and, as I said, have some interest in working on a perl module for this. I am very unlikely to get around to doing it on my own, however.
As far as a formal specification of the format, I do not believe one has ever existed. I suppose that David Harris is using the source code or comments in the source code as his documentation, and he is unwilling to let out the source without an NDA. (This also is why there is no Linux port. David is ammenable in theory to the idea of someone doing such a port, but they would have to sign an NDA and meet other criteria. An unusual attitude for a freeware developer, perhaps, but Pegasus is of unusual quality, as well. I have switched to Gnus, because I was unwilling to be tied to a single platform and not skilled in C/C++ to attempt the port, even if I were willing to sign an NDA (something I have yet to think about enough to decide; I would have thought about it for pmail, if it were written in a language I would be comfortable working with to do a port).) So, we would have to reverse-engineer the format.
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
In reply to Re: Parsing Pegasus email boxes
by jonadab
in thread Parsing Pegasus email boxes
by peterr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |