zzgulu has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am new to Perl and trying to figure out how I can select a block of text and extract some data out of it. All blocks are the same; they are \n\n apart from each other and start with string "Processing..." I want to extract what comes after "Processing.....:", tab what comes after "Phrase:" tab, what comes underneath of "Meta Mapping (some digit)", if there are more than one phrase, same process happens and "Process..:" data being repeated for all phrases until it reaches \n\n. So, for the block below I should get two rows, in the first column "Mental status change over the last 5 days" being repeated in row 1 and 2, the "phrase:" results appear in column 2, and the line below "meta mapping" appears in the third column (which is "none" for the second row. The loop should jump to the next block until the end of the file. "meta candidate" results can be ignored. I'd appreciate any help/hint

Processing 00000000.tx.2: Mental status change over the last 5 days.
Phrase: "Mental status change"
Meta Mapping (1000)
1000 D0072276:MENTAL STATUS CHANGE (ALTERED MENTAL STATUS) (Mental or Behavioral Dysfunction)

Phrase: "over the last 5 days."
Meta Candidates (0): <none>
Meta Mappings: <none>


Processing 00000000.tx.1:........

output
Mental status change over the last 5 days (tab) Mental status change (tab) 1000 D0072276:MENTAL .......
Mental status change over the last 5 days (tab) over the last 5 days (tab) none

Replies are listed 'Best First'.
Re: block extraction
by kennethk (Abbot) on Feb 03, 2009 at 15:20 UTC
    Sounds like you need to open a file, read data in from a file, parse it into an array (perhaps using a regular expression), and print it joined by tabs. What have you tried so far, or what resources are you using for reference? Note that people on this site are happy to help plan and debug, but we don't generally write whole-hog scripts (unless you want to give me money...)
Re: block extraction
by hbm (Hermit) on Feb 03, 2009 at 16:35 UTC

    Be encouraged! This is a great case to learn how wonderful Perl is. In addition to the other suggestions, look up $/, which will allow you to break your file into records very easily.

      Thank you for the hints. I started like below but I guess I'm way off! Do you suggest to go another direction? Thank you for the help

      open IN, "." || "die, can't open";
      undef($/);
      $string=<IN>;
      $string=~m/(^Processing\s\d+\.tx\.\d+:)(.*?)(^Phrase:)(.*?)(Meta Mapping)(.*?)(\n)/g;
      print "$2\t$4\t$6";
      exit;

        Here's a simple example to open the file and print out the records. When you understand this, you can begin to manipulate the records before printing.

        use strict; use warnings; my $file = "t.txt"; $/="\n\n"; open IN, "<", $file or die "Unable to open $file: $!"; while(<IN>){ print "=======\n", $_, "=======\n\n"; } close IN;
Re: block extraction
by Bloodnok (Vicar) on Feb 03, 2009 at 15:50 UTC
    Looks like you need a range operator : Range Operators

    Update: Updated the link - thanx to kennethk

    A user level that continues to overstate my experience :-))