in reply to Re^3: tab delimited extraction, formatting the output
in thread tab delimited extraction, formatting the output

Thank you hbm. Yes, my record has changed. I started with a tab delimited file with "EOF." record separator but it didn't work out since the script processed only up to 14,000 lines and then exit without any error. Plus, I was missing one of the components in my original text. So, I started working on another output format from the same text. In the new format, records are separated with two blank lines so I thought to use \n\n\n as the separator
For some reason when I tried to post my question spaces were being eliminated in my post so by typing (\s\s) I was trying to imply that in my real record there are two spaces before my mapping starts.Sorry, my bad!
Thank you again for all your help and explanation. Is there any book you recommend for a Perl beginner like me to start with? I started with Perl Programming for Medicine and Biology" but I think some basic (and fundamental) concepts are not covered in that book.
  • Comment on Re^4: tab delimited extraction, formatting the output

Replies are listed 'Best First'.
Re^5: tab delimited extraction, formatting the output
by Anonymous Monk on Feb 17, 2009 at 14:00 UTC
    it never ends! I was validating the data and noticed that there are ocasions where more than one mapping exists like below. I thought I could simply add /g to (/\s\s/) but apparently that doesn't do the work. Any suggestion?

    Phrase: "of hemorrhage"
    Meta Mapping (1000)
    <2 spaces>1000 D0046004:HAEMORRHAGE (BLEEDING) Finding
    Meta Mapping (1000)
    <2 spaces>1000 D0046011:HAEMORRHAGE NOT OTHERWISE SPECIFIED (HEMORRHAGE NOT OTHERWISE SPECIFIED) Finding

      Below I made two small changes that may do what you want. If not, you've been given all the concepts; try yourself!

      use strict; use warnings; my $file = "z.txt"; open my $fh, "<", $file or die "Unable to open $file: $!"; my ($p_val, $m_val); { local $/ = '\n\n\n'; while (<$fh>) { foreach (split/\n/) { if (s/\bProcessing\s\d+\.tx\.\d+: //) { print "$_\n"; } elsif (s/\bPhrase: //) { s/"//g; $p_val = $_; } elsif (/^\s\s(.+)/) { $m_val = $_; if (defined $p_val) { print "\t$p_val\t$m_val\n"; undef $p_val; ### moved undef here } else { ### new else condition print "\t\t$m_val\n"; } } } } } close $fh; __END__ Pulmonary embolism at the time of hip replacement Pulmonary embolism 1000 D0076131:PULMONARY EMBOLISM Dis +ease or Syndrome of hip replacement 1000 D0554893:HIP REPLACEMENT (STATU +S POST HIP REPLACEMENT) Finding of hemorrhage 1000 D0046004:HAEMORRHAGE (BLEEDING) Finding 1000 D0046011:HAEMORRHAGE NOT OTHERWISE SPECIFIED (H +EMORRHAGE NOT OTHERWISE SPECIFIED) Finding
        Thank you again hmb for your great help, explanations, and recommendation for the book
Re^5: tab delimited extraction, formatting the output
by hbm (Hermit) on Feb 17, 2009 at 14:20 UTC

    I do recommend O'Reilly's Perl Cookbook. I have the second edition, and its chapter titles include: Strings; Numbers; Dates and Times; Arrays; Hashes; Pattern Matching; File Access; Subroutines; References and Records; Database Access; Process Management; CGI Programming; and XML. It provides many examples and has a robust 30-page index, and certainly covers all the questions you've raised in this thread.