in reply to Re^2: Subsetting text files containing e-mails
in thread Subsetting text files containing e-mails

"I think you're assuming that each e-mail will begin with '^From: '"

Actually, no. /^From:/im performs a case insensitive multi-line match. The ^ anchors the start of any line (and is unaffected by setting $/) so the match will find "From" and the start of the string or at the start of any following new line delimited "line". Try taking the sample code I provided reorder the header line, add new header lines, whatever takes your fancy so long as you don't add bogus blank lines before the "From" line.

Another useful link may be perlretut. There's a lot of reading there, but it will be worth the time working through it!

True laziness is hard work

Replies are listed 'Best First'.
Re^4: Subsetting text files containing e-mails
by PeterCap (Initiate) on Jan 27, 2012 at 08:26 UTC

    Aha! I get it. So essentially when a paragraph is found that contains '^From:' it places a marker at the beginning of that paragraph?

    I could not figure out how it was handling all the blank lines within the e-mails until I realized that it wasn't and didn't need to.

    Just to be clear, in order to actually subset the file I would still need to close and reopen it, right? I'm thinking something like:

    open (<MYDATA>, $filein); while (<MYDATA>) { if (/^---- Email 1/ ... /---- Email2/) { open (<MYOUTPUT>, ">$fileout"); print MYOUTPUT $_; close (MYOUTPUT); } } close (MYDATA);

    I suppose I might create a loop so that a new value for the search terms (i.e., /^---- Email 2/ ... /^---- Email 3/ for the second iteration, etc.) is selected as well as a new output file to catch the results...

      You don't need more than one pass through the source file. Just create the output files as you need them. In sketch you'd have something like:

      use strict; use warnings; my $emailNum; my $outFile; $/ = ''; # Set readline to "Paragraph mode" while (<DATA>) { if (!$emailNum || /^From:/im) { close $outFile if $outFile; my $fname = sprintf "mails_%06d.txt", ++$emailNum; open $outFile, '>', $fname or die "Can't create $fname: $!\n"; } print $outFile $_; }
      True laziness is hard work