in reply to Extracting fields

One sample is not enough, but maybe something like this?

#! perl -slw use strict; my %data = ( 1047633 => '01.12.199100.00.00003 T8 15 SN Y2001.11.200400095.8000071.8500081.454 +001.11.1994(Anaes.)5001.12.1991Metatarsal, 1 of, treatment of fractur +e of' ); my $re_date = qr[\d{2}\.\d{2}\.\d{4}]; my $re_float= qr[\d{5}\.\d{2}]; for my $key ( keys %data ) { my @fields = $data{ $key } =~ m[ ( 20 $re_date $re_float{3} ) ( 40 $re_date \( [^)]+ \) ) ( 50 $re_date .* $ ) ]x; print "'$_'" for @fields; } __END__ P:\test>401374 '2001.11.200400095.8000071.8500081.45' '4001.11.1994(Anaes.)' '5001.12.1991Metatarsal, 1 of, treatment of fracture of'

To explan the regex:

m[ ## Capture, starting with '20', one date, and 3x %8.2 floats ( 20 $re_date $re_float{3} ) ## Capture, starting with '40', one date, '(', non-')' to the ')' ( 40 $re_date \( [^)]+ \) ) ## Capture, '50', a date, everything to the end of line. ( 50 $re_date .* $ ) ]x; ## ignore whitespace.

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: Extracting fields
by kerrya (Novice) on Oct 22, 2004 at 06:05 UTC
    Thanks for your suggestion.

    I have included 3 records below. Please see my response to Dave's feedback for a more complete description of the problem.

    Appreciate your help.

    '1056070' => '01.05.200000.00.00005 I2 SN Y2001.11 +.200400113.1500084.9000096.205001.05.2000Computed tomography - scan o +f facial bones,5001.05.2000paranasal sinuses or both, with scan of br +ain,5001.05.2000without intravenous contrast medium (R) (NK)5001.05.2 +000(Anaes.)', '1032042' => '01.12.199100.00.00003 T8 2 SN Y2001.11 +.200401097.2000822.904001.11.1995(Anaes.)5001.12.1991Rectum and anus, + abdominoperineal resection of,5001.12.1991combined synchronous opera +tion, abdominal5001.12.1991resection5001.12.1991(Assist.)', '1021432' => '01.11.200100.00.00003 T1011 SN Y2001.11 +.200400084.2500063.2000071.655001.11.2001Initiation of management of +anaesthesia for5001.11.2001repair of arteriovenous fistula of knee or +5001.11.2001popliteal area5001.11.2001(005)',

      I agree with bobf, stripping the record sperarators and then trying to put them back is the wrong way to approach the problem.

      Much better would be setting $/="\n10"; to slurp the combined records in one block at a time retaining the newlines. These then give you a very easy way to further break up the combined records.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon