Parsing multi-line records

chuntoon has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing multi-line records by twerq (Deacon) on Dec 04, 2001 at 20:51 UTC
well, i guess you could make a small subroutine that returns an 11-line block out of the file. not taking into account any bad records, here's some pretty straight-forward code: `sub returnrec { my ($i,$record); for ($i = 0; $i < 11; $i++) { $record .= <FILEHANDLE>; } return $record; }` [download] then you can say something like: `my @elements = split /\;/, returnrec();` [download] --twerq	[reply] [d/l] [select]
Re: Parsing multi-line records by strat (Canon) on Dec 04, 2001 at 21:32 UTC
`my $recordSet = 11; open (FILE, $file) or die $!; my @list = <FILE>; close (FILE); my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ push (@datasets, join("\n", @subList); } undef @list;` [download] So, now you've got your datasets in the list @datasets...	[reply] [d/l]
(Ovid) Re: Re: Parsing multi-line records by Ovid (Cardinal) on Dec 04, 2001 at 21:44 UTC
I think this is the best answer in the thread, but the join is a bad idea. You've already got newlines in there, so, so later someone will need to come along and do a "`split /\n\n/, @datasets;`". Assuming that the newlines are not needed, I would chomp them and use array references. `my $recordSet = 11; open FILE, "< $file" or die "Cannot open $file for reading: $!"; chomp ( my @list = <FILE> ); close FILE; my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ push @datasets, \@subList; }` [download] With this, you now have an array or array refs, each containing precisely one record and no worries about adding extra newlines. Remember to always use Perl's data structures when possible (and reasonable, of course). Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l]
Re: (Ovid) Re: Re: Parsing multi-line records by strat (Canon) on Dec 04, 2001 at 22:10 UTC
or just take it into an LDIF-like datastructure: `my $recordSet = 11; open FILE, "< $file" or die "Cannot open $file for reading: $!"; chomp ( my @list = <FILE> ); close FILE; my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ my %hash = (); foreach (@subList){ my ($key, $value) = split(/\s\:\s/, $_, 2); if (exists $hash{$key} ){ push (@{ $hash{$key} }, $value); } else { $hash{$key} = $value; } push @datasets, \%hash; # if unique key exists, better use a Hash of Hashes # instead of array of hashes } } }` [download]	[reply] [d/l]
Re: Parsing multi-line records by frankus (Priest) on Dec 04, 2001 at 20:42 UTC
use `$/=undef` to slurp all the file into a string. Replace all \n with commas except for the 12th? `$/=undef; # Slurps all data in file. $_=<DATA>; # Put it in $_, convenient for regex. my $lines=0; # Line count $. might suffice and then the pre increme +nt can be dropped. my $records=11; # Number of lines in a record. s/\n/(',',"\n")[++$line%$records==0]/eg;# replaces all \n with commas +except the recordth. print $_;` [download] -- Brother Frankus. ¤	[reply] [d/l] [select]
Re: Parsing multi-line records by danboo (Beadle) on Dec 04, 2001 at 22:34 UTC
I'm not sure if you want all the fields in a single data structure or in a nested one. In other words, do you want a single array of the lines, split on colons; or do you want an array of 11 arrays, with each child array containing the colon separated fields of its corresponding line? Here is what I might do based on my current understanding: `#!/usr/bin/perl use strict; use warnings; use constant RECORD_SIZE => 11; my (@record, @lines); open FOO, 'test.plx' or die; while (@lines = grep defined, map scalar <FOO>, 1 .. RECORD_SIZE) { if (@lines == 11) { chomp @lines; # flatten into a single array ... @record = map split(':'), @lines; # ... or build an array of arrays # @record = map [ split ':' ], @lines; } else { # what to do if line count is not a multiple of RECORD_SIZE? } }` [download]	[reply] [d/l]
Re: Parsing multi-line records by Fastolfe (Vicar) on Dec 05, 2001 at 04:37 UTC
Without fully understanding your input data, simply setting `$/` (which, in English, literally means INPUT_RECORD_SEPARATOR) to your separator (`:`?) could do exactly what you want. `my @data; { local $/ = ":"; # or "\n:\n" for : on an empty line @data = <DATA>; }` [download] This would put each "record" (set of lines) into individual elements of `@data`. It'd be up to you then to split on newlines or whatever, and to eliminate the trailing : on most records.	[reply] [d/l] [select]