chuntoon has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I know this is gonna sound stupid, but here goes...

I have an input file containing personnel info (name, boss, what-not) which I need to parse into a spreadsheet for my database limited superiors. I figured out how to parse/split based on characters (:). The problem is each individual record is actually 11 rows in the file.

Is there a reasonably simple way that a relative Perl newbie can understand which will parse on colons and take into account the multiple line records.

Thanks in advance,
chuntoon

Replies are listed 'Best First'.
Re: Parsing multi-line records
by twerq (Deacon) on Dec 04, 2001 at 20:51 UTC
    well, i guess you could make a small subroutine that returns an 11-line block out of the file. not taking into account any bad records, here's some pretty straight-forward code:
    sub returnrec { my ($i,$record); for ($i = 0; $i < 11; $i++) { $record .= <FILEHANDLE>; } return $record; }
    then you can say something like:
    my @elements = split /\;/, returnrec();

    --twerq
Re: Parsing multi-line records
by strat (Canon) on Dec 04, 2001 at 21:32 UTC
    my $recordSet = 11; open (FILE, $file) or die $!; my @list = <FILE>; close (FILE); my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ push (@datasets, join("\n", @subList); } undef @list;
    So, now you've got your datasets in the list @datasets...

      I think this is the best answer in the thread, but the join is a bad idea. You've already got newlines in there, so, so later someone will need to come along and do a "split /\n\n/, @datasets;". Assuming that the newlines are not needed, I would chomp them and use array references.

      my $recordSet = 11; open FILE, "< $file" or die "Cannot open $file for reading: $!"; chomp ( my @list = <FILE> ); close FILE; my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ push @datasets, \@subList; }

      With this, you now have an array or array refs, each containing precisely one record and no worries about adding extra newlines. Remember to always use Perl's data structures when possible (and reasonable, of course).

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        or just take it into an LDIF-like datastructure:
        my $recordSet = 11; open FILE, "< $file" or die "Cannot open $file for reading: $!"; chomp ( my @list = <FILE> ); close FILE; my @datasets = (); while (my @subList = splice( @list, 0, $recordSet) ){ my %hash = (); foreach (@subList){ my ($key, $value) = split(/\s*\:\s*/, $_, 2); if (exists $hash{$key} ){ push (@{ $hash{$key} }, $value); } else { $hash{$key} = $value; } push @datasets, \%hash; # if unique key exists, better use a Hash of Hashes # instead of array of hashes } } }
Re: Parsing multi-line records
by frankus (Priest) on Dec 04, 2001 at 20:42 UTC

    use $/=undef to slurp all the file into a string. Replace all \n with commas except for the 12th?

    $/=undef; # Slurps all data in file. $_=<DATA>; # Put it in $_, convenient for regex. my $lines=0; # Line count $. might suffice and then the pre increme +nt can be dropped. my $records=11; # Number of lines in a record. s/\n/(',',"\n")[++$line%$records==0]/eg;# replaces all \n with commas +except the recordth. print $_;

    --

    Brother Frankus.

    ¤

Re: Parsing multi-line records
by danboo (Beadle) on Dec 04, 2001 at 22:34 UTC
    I'm not sure if you want all the fields in a single data structure or in a nested one. In other words, do you want a single array of the lines, split on colons; or do you want an array of 11 arrays, with each child array containing the colon separated fields of its corresponding line?

    Here is what I might do based on my current understanding:

    #!/usr/bin/perl use strict; use warnings; use constant RECORD_SIZE => 11; my (@record, @lines); open FOO, 'test.plx' or die; while (@lines = grep defined, map scalar <FOO>, 1 .. RECORD_SIZE) { if (@lines == 11) { chomp @lines; # flatten into a single array ... @record = map split(':'), @lines; # ... or build an array of arrays # @record = map [ split ':' ], @lines; } else { # what to do if line count is not a multiple of RECORD_SIZE? } }
Re: Parsing multi-line records
by Fastolfe (Vicar) on Dec 05, 2001 at 04:37 UTC
    Without fully understanding your input data, simply setting $/ (which, in English, literally means INPUT_RECORD_SEPARATOR) to your separator (:?) could do exactly what you want.
    my @data; { local $/ = ":"; # or "\n:\n" for : on an empty line @data = <DATA>; }
    This would put each "record" (set of lines) into individual elements of @data. It'd be up to you then to split on newlines or whatever, and to eliminate the trailing : on most records.