cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy bros. I am trying to write a script to read archive files from GNU Mailman like this one. It's a .gz file but when I run IO::Decompress on it, I get a bunch of binary chars. I expected it to be a flat file or something with xml markup, not binary Does anyone know how to read it?

TIA...Steve

Replies are listed 'Best First'.
Re: Reading gnu mailman archives
by almut (Canon) on Jan 25, 2010 at 22:13 UTC

    This works for me  (your sample file seems to be double-compressed, so it should more appropriately be named .txt.gz.gz):

    use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ; gunzip "2010-January.txt.gz" => "2010-January.tmp" or die "gunzip fail +ed: $GunzipError\n"; gunzip "2010-January.tmp" => "2010-January.txt" or die "gunzip fail +ed: $GunzipError\n";

    P.S. it's not XML, but rather classic mbox format.

      How strange. When I run the same code and open the resulting file in a text editor I get:

      ‹†]Kÿ/var/lib/mailman/archives/private/mythtv-users/2010-January.txt

      (some of the chars are rendering as excaped html here; they're extended ascii on my screen) You're getting clear text?

Re: Reading gnu mailman archives
by Khen1950fx (Canon) on Jan 26, 2010 at 10:05 UTC
      I tried Mail::Mbox::MessageParser, and when it opens the file it says it's not a Mailbox file.