kerrya has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am trying to re-format the following data from multi-line to single records (1000001 and 1000002 are the primary keys):
-------
1000001 01.11.199600.00.00001 A1 1 SN Y
2001.11.200400098.0500073.5500083.35
5001.11.1997Professional attendance being an attendance at
5001.11.1997other than consulting rooms
1000002 01.11.199600.00.00001 A1 1 SN Y
2001.11.200400098.0500073.5500083.35
5001.11.1997Professional attendance being an attendance at
5001.11.1997consulting rooms
etc...
-------
Would be grateful for any suggestions that you may have. Thanks in advance.
  • Comment on Format multiline to single line records

Replies are listed 'Best First'.
Re: Format multiline to single line records
by kvale (Monsignor) on Oct 19, 2004 at 07:59 UTC
    A simple approach is to loop through the lines and start a new record each time a primary key is detected:
    my %record; my $item; while (my $line = <DATA>) { chomp $line; if ($line =~ /^(\d+) (.*)$/) { $item = $1; $record{ $item} = " $2"; } else { $record{ $item} .= $line; } } foreach my $item (keys %record) { print "$item $record{ $item}\n"; } __DATA__ 1000001 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997other than consulting rooms 1000002 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997consulting rooms

    -Mark

      Thanks Mark.
Re: Format multiline to single line records
by thor (Priest) on Oct 19, 2004 at 11:24 UTC
    Given the format that you have, setting $/ (the input record separator) is an easy way to do this:
    use strict; use warnings; local $/="rooms\n"; while(<DATA>) { print; } __DATA__ 1000001 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997other than consulting rooms 1000002 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997consulting rooms

    thor

    Feel the white light, the light within
    Be your own disciple, fan the sparks of will
    For all of us waiting, your kingdom will come

      Thanks thor
Re: Format multiline to single line records
by deibyz (Hermit) on Oct 19, 2004 at 11:06 UTC
    You can put a record in each line an then use split to get single records:

    #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my $text = do {local $/;<DATA>}; $text =~ s/\n(?!\d{7})/ /g; # Remove newline if no # new record my %records = map { split /\s+/, $_ , 2 } split /\n/, $text; print Dumper(\%records) __DATA__ 1000001 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997other than consulting rooms 1000002 01.11.199600.00.00001 A1 1 SN Y 2001.11.200400098.0500073.5500083.35 5001.11.1997Professional attendance being an attendance at 5001.11.1997consulting rooms

    Hope it helps,
    deibyz

      Thanks Deibyz