in reply to regex question: store multiple lines as a string

Am I missing something when I interpret OP's spec, "I'd like to store everything starting from 'line=ULMNm' till before the next 'line=ULMNm' as one string", as meaning the sample data should be divided into elements, each with a single element begining with "line=" and ending with the first instance of two newlines?

Missing something or not, that's how I read it in writing this to satisfy my understanding of the spec:

#!/usr/bin/perl use strict; use warnings; # 864768 my @words = split /(line=)/, do { local $/="\n\n"; <DATA> }; # a v +ariant of moritz' advice for my $words(@words) { chomp $words; if ($words eq "line=") { print $words; }else{ print "$words \n -------\n"; # the dashes visually separa +te the output records } } exit; __DATA__ line=ULMNm 3 1fdy_07 N-ACETYLNEURAMINATE LYASE user + 1 3 RMSD = 1.06 A MATRIX: -0.3862 -0.2080 -0.8987 0.6457 0.6347 -0.4244 -0.6587 0 +.7442 0.1108 -16.917 -91.429 -35.632 D 47 SER A 57 SER.? D 48 THR A 56 THR.? D 165 LYS A 33 LYS~? line=ULMNm 3 2tmd_00 TRIMETHYLAMINE DEHYDROGENASE user + 1 3 RMSD = 1.15 A MATRIX: 0.9011 -0.4313 0.0445 -0.1032 -0.3130 -0.9441 -0.4211 -0 +.8462 0.3266 52.913 23.262 25.449 A 169 TYR A 41 TYR~? A 172 HIS A 95 HIS^? A 267 ASP A 98 ASP~? line=ULMNm 3 4fdy_07 P-HYDROOXIDE user 1 +3 RMSD = 1.06 A MATRIX: -0.3862 -0.2080 -0.8987 0.6457 0.6347 -0.4244 -0.6587 0 +.7442 0.1108 -16.917 -91.429 -35.632 D 47 SER A 57 SER.? D 48 THR A 56 THR.? D 165 PQR A 33 PRQ~? line=ULMNm 3 5tmd_00 BAZ Blivitz user 1 3 + RMSD = 1.15 A MATRIX: 0.9011 -0.4313 0.0445 -0.1032 -0.3130 -0.9441 -0.4211 -0 +.8462 0.3266 52.913 23.262 25.449 A 169 TYR A 41 TYR~? A 172 HIS A 95 HIS^? A 267 XYZ A 98 XYZ~?

and we see this, upon execution:

F:\_wo\pl_test>perl 864768.pl ------- line=ULMNm 3 1fdy_07 N-ACETYLNEURAMINATE LYASE user + 1 3 RMSD = 1.06 A MATRIX: -0.3862 -0.2080 -0.8987 0.6457 0.6347 -0.4244 -0.6587 0 +.7442 0.1108 -16.917 -91.429 -35.632 D 47 SER A 57 SER.? D 48 THR A 56 THR.? D 165 LYS A 33 LYS~? ------- line=ULMNm 3 2tmd_00 TRIMETHYLAMINE DEHYDROGENASE user + 1 3 RMSD = 1.15 A MATRIX: 0.9011 -0.4313 0.0445 -0.1032 -0.3130 -0.9441 -0.4211 -0 +.8462 0.3266 52.913 23.262 25.449 A 169 TYR A 41 TYR~? A 172 HIS A 95 HIS^? A 267 ASP A 98 ASP~? ------- line=ULMNm 3 4fdy_07 P-HYDROOXIDE user 1 +3 RMSD = 1.06 A MATRIX: -0.3862 -0.2080 -0.8987 0.6457 0.6347 -0.4244 -0.6587 0 +.7442 0.1108 -16.917 -91.429 -35.632 D 47 SER A 57 SER.? D 48 THR A 56 THR.? D 165 PQR A 33 PRQ~? ------- line=ULMNm 3 5tmd_00 BAZ Blivitz user 1 3 RMSD = 1.15 A MATRIX: 0.9011 -0.4313 0.0445 -0.1032 -0.3130 -0.9441 -0.4211 -0 +.8462 0.3266 52.913 23.262 25.449 A 169 TYR A 41 TYR~? A 172 HIS A 95 HIS^? A 267 XYZ A 98 XYZ~? ------- F:\_wo\pl_test>

Note the empty record that is the first output. Not good... hence, I'd welcome comments on my algorithm/code AND any comments rebutting my interpretation of the spec.

Belated addition, 2125 EDT (U.S., roughly 10 hours later): Re OP's question about storing the munged data in variables. Whilst working this out, I used Data::Dumper to try to ascertain why an earlier iteration didn't work... and after fixing my foolishness but before removing D::D from the code, observed that D::D's list of vars had "line=" (see split at line 10) in Var2, Var4... and the rest of each munged data section in Var3, Var5, ....