Apologies Dave, will be more specific.
I am trying to re-format a medical file that describes 39 000 procedures. Each procedure has sub record types categorised as 10, 20, 30, 40 and 50.
The problem is that rather than having all information relating to a procedure being available in one line, the procedure information has been separated into multiple lines (as below). The first row contains the record 10 number and the procedure number (00001), combined as 1000001 with record 10 information following. The next row has the record 20 number and record 20 information etc.
1000001 01.11.199600.00.00001 A1 1 SN Y
2001.11.200400098.0500073.5500083.35
5001.11.1997Professional attendance being an attendance at
5001.11.1997other than consulting rooms, by a general
5001.11.1997practitioner on not more than 1 patient
I have managed to get all the information related to a procedure onto one line using:
my $text = do {local $/;<DATA>};
$text =~ s/\n(?!\d{7})//g; # remove newline if no new record
my %records = map {split /\s+/, $_, 2 } split /\n/, $text;
print Dumper (\%records);
Which gives:
'1000001' => '01.11.199600.00.00001 A1 1 SN Y2001.11.200400098.0500073.5500083.355001.11.1997Professional attendance being an attendance at5001.11.1997other than consulting rooms, by a general5001.11.1997practitioner on not more than 1 patient',
I am now trying the start of each sub record into it's own field.
The position of sub group data is not consistent between procedure records. Also, the sub records (20-50) are not represented in all procedures.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.