in reply to Re^4: tab delimited extraction, formatting the output
in thread tab delimited extraction, formatting the output

it never ends! I was validating the data and noticed that there are ocasions where more than one mapping exists like below. I thought I could simply add /g to (/\s\s/) but apparently that doesn't do the work. Any suggestion?

Phrase: "of hemorrhage"
Meta Mapping (1000)
<2 spaces>1000 D0046004:HAEMORRHAGE (BLEEDING) Finding
Meta Mapping (1000)
<2 spaces>1000 D0046011:HAEMORRHAGE NOT OTHERWISE SPECIFIED (HEMORRHAGE NOT OTHERWISE SPECIFIED) Finding
  • Comment on Re^5: tab delimited extraction, formatting the output

Replies are listed 'Best First'.
Re^6: tab delimited extraction, formatting the output
by hbm (Hermit) on Feb 17, 2009 at 16:58 UTC

    Below I made two small changes that may do what you want. If not, you've been given all the concepts; try yourself!

    use strict; use warnings; my $file = "z.txt"; open my $fh, "<", $file or die "Unable to open $file: $!"; my ($p_val, $m_val); { local $/ = '\n\n\n'; while (<$fh>) { foreach (split/\n/) { if (s/\bProcessing\s\d+\.tx\.\d+: //) { print "$_\n"; } elsif (s/\bPhrase: //) { s/"//g; $p_val = $_; } elsif (/^\s\s(.+)/) { $m_val = $_; if (defined $p_val) { print "\t$p_val\t$m_val\n"; undef $p_val; ### moved undef here } else { ### new else condition print "\t\t$m_val\n"; } } } } } close $fh; __END__ Pulmonary embolism at the time of hip replacement Pulmonary embolism 1000 D0076131:PULMONARY EMBOLISM Dis +ease or Syndrome of hip replacement 1000 D0554893:HIP REPLACEMENT (STATU +S POST HIP REPLACEMENT) Finding of hemorrhage 1000 D0046004:HAEMORRHAGE (BLEEDING) Finding 1000 D0046011:HAEMORRHAGE NOT OTHERWISE SPECIFIED (H +EMORRHAGE NOT OTHERWISE SPECIFIED) Finding
      Thank you again hmb for your great help, explanations, and recommendation for the book