gossamer has asked for the wisdom of the Perl Monks concerning the following question:

I have a rather involved perl script that reads an email from STDIN using MIME::Parser and a few other libraries to split apart an email into its different components for writing to a database. The problem is that it now appears some emails have subjects that span two lines, and MIME::Parser doesn't appear to be able to process it properly.
use strict; use Advisory; use MIME::Parser; use MIME::Entity; use MIME::WordDecoder; use MIME::Tools; use MIME::Decoder; use Email::MIME; my $parser = MIME::Parser->new; + + + $parser->extract_uuencode(1); $parser->extract_nested_messages(1); $parser->output_to_core(1); my $buf; while(<STDIN> ){ $buf .= $_; } my @mailData = split( '\n', $buf); my $entity = $parser->parse_data($buf); my $subject = $entity->head->get('Subject'); my $from = $entity->head->get('From'); my $AdvDate = $entity->head->get('Date'); my $linecount = 0; my $inadvis = 0; foreach my $line (@mailData) { chomp($line); if ($line =~ m/^Description:/) { $startShort = "true"; next; } if ($linecount lt 5 && $startShort eq "true") { $shortDesc .= $line . " "; $linecount++; } #print "line: $line\n"; # MGASA-2018-0463 - Updated roundcubemail packages fix securit +y vulnerability & bugs # MGASA-2018-0439 - Updated ansible package fixes security vul +nerabilities #print "line: $line\n"; if ($line =~ m/MGASA-(\d+)-(\d+) - Updated (.*) package/) { $pkgname = $3; $subject = "Mageia $1-$2: $3 security update"; $inadvis = 1; } # MGASA-2019-0151 - Virtualbox 6.0.6 fixes security vulnerabil +ities elsif ($line =~ m/MGASA-(\d+)-(\d+) - (.*) (.*) fixes security + vulnerabilities/) { $pkgname = $3; $subject = "Mageia $1-$2: $3 security update"; $inadvis = 1; } # [updates-announce] MGASA-2023-0355: New chromium-browser-sta +ble 120.0.6099.129 fixes bugs and vulnerabilities elsif ($line =~ m/MGASA-(\d+)-(\d+): New (.*) (.*) fixes bugs +and/) { $pkgname = $3; $subject = "Mageia $1-$2: $3 security update"; $inadvis = 1; } if ($inadvis == 1) { $advisory .= $line . "\n"; } }
The example subject I'm working with that spans two lines is as follows:
Subject: [updates-announce] MGASA-2023-0355: New chromium-browser-stab +le 120.0.6099.129 fixes bugs and vulnerabilities
I've put the entire mbox email in a paste here

https://pastebin.com/LRs9J4pd

Any ideas greatly appreciated.

Replies are listed 'Best First'.
Re: MIME::Parser and multi-line subjects
by choroba (Cardinal) on Dec 26, 2023 at 23:07 UTC
    MIME::Parser handles multiline subjects just fine.
    #!/usr/bin/perl use warnings; use strict; use Encode qw{ encode }; use LWP::Simple qw{ get }; use MIME::Parser; my $parser = 'MIME::Parser'->new; $parser->extract_uuencode(1); $parser->extract_nested_messages(1); $parser->output_to_core(1); my $email = encode('UTF-8', get('https://pastebin.com/raw/LRs9J4pd')); open my $f, '<', \$email or die $!; my $e = $parser->parse($f); print $e->head->get('Subject'); __END__ Output: [updates-announce] MGASA-2023-0355: New chromium-browser-stable 120.0.6099.129 fixes bugs and vulnerabilities

    The problem is the code then processes the input file line by line. That's where the multiline subject is not processed correctly, I'd guess.

    Update: It's hard to tell what exactly is the problem, as you didn't specify what error you got, and I'm not able to run your script without changes, as the Advisory module doesn't exist on CPAN.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      That was very helpful, thanks. The "Advisory" module is one I created that manages the database portion of this.

      I was able to rework the script to check the $subject itself after stripping off the carriage return at the end. I didn't realize $subject did indeed contain the whole line, but it also contained the carriage return.