in reply to Probably very simple (for those in the know)

I don't know if this will work for what you want (and it may not be the best solution, so I pray other, wiser monks will comment, so we both may learn), but this is how I have handled it in the past (with __DATA__ for testing, fields named as they were so I could verify the results quickly visually). Good luck in finding a solution.
#!/usr/bin/perl -w use strict; use warnings; my $multiline_seperator = "\n"; my ($fieldname, $fieldvalue, $line, %pkg); while ($line = <DATA>) { # Lines following empty/space/astrisk-filled lines # assumed to be comments $fieldname = undef if (($line =~ m/^\s+$/) or ($line =~ m/^\*+$/)); chomp($line); # Assumes colons do not appear except in lines with field names if ($line =~ m/:/) { ($fieldname, $fieldvalue) = split(/:/, $line, 2); # Remove trailing spaces from field name, # leading spaces from field value $fieldname =~ s/\s+$//g; $fieldvalue =~ s/^\s+//g; } else { $fieldvalue = $line; } # Skip remaining steps if line was a comment next unless (defined($fieldname)); if (exists($pkg{$fieldname})) { $pkg{$fieldname} .= $multiline_seperator . $fieldvalue; } else { $pkg{$fieldname} = $fieldvalue if ((length($fieldvalue)) and (defined($fieldname))); } } # For testing only foreach my $k (sort(keys(%pkg))) { print($k, "\t:\t", $pkg{$k}, "\n"); } __DATA__ a-sendee: data1 data1 data1 b-sender: data2 c-date: data3 *************************************** Copyright blah blah blah d-postage: data4 e-deliverydate: data5 *************************************** unimportant text f-name: data6 g-paycode: data7 g-paycode: data8 location 1 h-state: data9 i-zip: data10 location 2 h-state: data11 i-zip: data12
Update: I must admit that I read the question and answered before reading carefully all responses, especially the response by jonjacobmoon, which basically spelled out what I coded.