Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks.

I was wondering if you could help me today. I'm trying to create an email parser that essentially reads an email and breaks it apart into sets of usable chunks. Neverminding the email portion of it now, I'm trying to get the regex/breaking the data apart logic down first.

The email will look something like..
[title] blah blah blah [title] [tags] tag tag blah blah [tags] [message] blah blah blah red riding hood runs away from the scary wolf blah blah [message] ############################################# [title] another one [title] [tags] more tags [tags] [message] another message here [message] ############################################# [title] last one [title] [tags] last one [tags] [message] more fun here [message]
I'm using the # as the separater and so far I can successfully split the one email into my usable pieces. my @split_msg = split(/#############################################/, $message); That part is good. Now I'm having more problems breaking each thing down to TITLE, TAG and MESSAGE.
foreach my $email (@split_msg) { my ($title, $tags, $msg); $email =~ m/\[title\](.+)\[title\]/i; $title = $1; $email =~ m/\[message\](.*)\[message\]/i; $msg = $2; print "$title\n\n$msg\n\n"; }
The above code keeps saying MSG is uninitialized. It was originall (.+) but I tried (.*) to see if that would make any difference. Each portion (tag, title, message) can contain new lines and I have to capture whatever is in between each of them.

Can you help me figure this one out?

Replies are listed 'Best First'.
Re: regex problem
by raybies (Chaplain) on Mar 17, 2011 at 15:10 UTC
    each time you run a regex, your $1, $2, $3 vars are reset. So your msg is in $1 not $2.
Re: regex problem
by arkturuz (Curate) on Mar 17, 2011 at 15:15 UTC
    It think this solves your problem:
    foreach my $email (@split_msg) { my ($title, $tags, $msg); $email =~ m/\[title\](.+)\[title\]/i; $title = $1; # match in single line mode $email =~ m/\[message\](.*)\[message\]/is; $msg = $1; # match is again in $1 print "TITLE:$title\nMESSAGE:$msg\n"; }

      And while you are at it, why not shrink the code to

      foreach (...) { my ($title,$tag,$message) = $email =~ /\[title](.*)\[title].*\[tags](. +*)\[tags].*\[message](.*)\[message]/is; ... }
        Yeah, sure, but I didn't want to confuse the OP.
Re: regex problem
by Anonymous Monk on Mar 17, 2011 at 15:11 UTC
    $1,$2 ... has to match parens, if there is only one set of parens, $2 can NEVER be defined
    'stuff' =~ /(.).../; print "$1\n"; 'stuff' =~ /(.).(.)./; print "$1 $2\n"; 'stuff' =~ /(.).(.).(.)/; print "$1 $2 $3\n"; __END__
    More in perlre,perlretut