in reply to Re-formatting multi-line records

It is a pretty easy task in perl. As you say you are not a programmer I will leave you to decipher this.

local $/ = ''; my $date = qr/\d\d\.\d\d.\d\d\d\d/; my $fee = qr/\d\d\d\d\d\.\d\d/; my $one = qr/ ^ \d\d #record10 = integer (length 2) (10). (\d{5}) #primary key = integer (length 5).(00001) (\d{3}) #sub item number = integer (length 3).(225) ($date) #start date = dd.mm.yyyy.(01.11.1996) ($date) #end date = dd.mm.yyyy.(00.00.0000) (\w{3}) #category = alphanumeric (length 3).(BVF) (\w{3}) #group = alphanumeric (length 3).(AAA) (\w{3}) #subgroup = alphanumeric (length 3).(A65) ([SN]) #item type = text (options: S or N).(S) ([ND]) #fee type = text (options: N or D).(N) (\w{3}) #provider type = alphanumeric (length 3).(009) ([YN]) #new item = text (options: Y or N).(Y) ([YN]) #item change = text (options: Y or N).(N) ([YN]) #procedure change = text (options: Y or N).(Y) ([YN]) #description change = text (options: Y or N).(N) ([YN]) #fee change = text (options: Y or N).(Y) $ /x; my $two = qr/ ^ \d\d #record20 = integer (length 2).(20) ($date) #start date = dd.mm.yyyy.(01.11.1996) ($fee) #fee = decimal (nnnnn.nn).(00098.05) ($fee) #benefit1 = decimal (nnnnn.nn).(00073.55) ($fee) #benefit2 = decimal (nnnnn.nn).(00083.35) $ /x; my $gen = qr/ ^ \d\d ($date) (.*) $ /x; while(my $rec = <DATA>) { my $pkey, my @out = (); for my $line( split /[\n\r]/, $rec ) { my $type = substr $line,0,2; if ( $type == 10 ) { my @data = $line =~ m/$one/; $pkey = $data[0]; @out[$type] = join ',', $pkey, $type, @data; } elsif ( $type == 20 ) { my @data = $line =~ m/$two/; @out[$type] = join ',', $pkey, $type, @data; } elsif ( $line =~ m/$gen/ ) { my @data = ( $1, $2 ); if ( defined $out[$type] ) { $out[$type] .= ' ' . $2; # continuing string } else { $out[$type] = join ',', $pkey, $type, @data; } } else { die "Invalid record $rec\n\n$line\n" } } print join "\n", @out[10,20,30,40,50]; print "\n"; } __DATA__ 100000122501.11.199600.00.0000BVFAAAA65SN009YNYNY 2001.11.199600098.0500073.5500083.35 3014.11.1996This derived fee is for professional attendances for GP an +d Specialist. 4023.12.1996(Anaes.) 5001.11.1997Professional attendance being an attendance at 5001.11.1997other than consulting rooms, by a general 5001.11.1997practitioner on not more than 1 patient. 100000222601.11.199600.00.0000BDGAABA66SN010YNYNY 2001.11.199600098.0500073.5500083.35 3014.11.1996This derived fee is for professional attendances by GP onl +y. 4023.12.1996(Anaes.) 5001.11.1997Professional attendance being an attendance at 5001.11.1997other than consulting rooms, by a general 5001.11.1997practitioner only on not more than 1 patient.

cheers

tachyon

Replies are listed 'Best First'.
Re^2: Re-formatting multi-line records
by Anonymous Monk on Oct 25, 2004 at 01:43 UTC
    Many thanks tachyon. I have purchased the complete perl reference and will go through your solution. In future I will post up my attempted program. Must admit I was too embaressed to post my first attempt.

      Good for you. Looks like you are working on Australian HIC data to me? Please note that you should inculde some data validation for the line types 10 and 20 - it is possible that the respective REs will not match so @data will be empty. I just assume it worked for the sake of brevity. Something as simple as die "Match failed:\n$line\n" unless @data might do.

      cheers

      tachyon

        Thanks for informing me of potential records 10 and 20 matching problems. Data is medicare benefits schedule, related to HIC data.
        Code works well. Thanks tachyon for your help.