Re^3: How best to strip text from a file?

Where does one record end and the next record start?

If FOO: marks the start of a new record, I wouldn't try to collect everything with one regular expression but go through the input line by line, and either set up a new field name into which to collect, or flush the current set of data once a new starting marker has been found:

use strict;
use Data::Dumper;

my %record;
sub flush {
    print Dumper \%record;
    %record = ();
};

my $current;
while (<DATA>) {
    if( /^(FOO):(.*)/ ) {
        flush() if keys %record;
        $current = $1;
        $record{ $current }.= $2;
    } elsif( /^([A-Z]+):(.*)/ ) {
        $current = $1;
        $record{ $current }.= $2;
    } else {
        $record{ $current }.= $_;
    };
};
flush() if keys %record;

__DATA__
FOO: Lorem ipsum dolor sit amet, consectetur adipisicing
 elit, sed do eiusmod tempor incididunt ut labore et dolore 
BAR: 2012
BAZ: 1234-567-890

FOO: test

BAZ: 0987-654-321
FOO: test2
BAR: 2014
[download]

Comment on Re^3: How best to strip text from a file? Select or Download Code