Your script would be significantly more efficient if you detected the start and end of each extraction region while you are reading in the file, something like this (assuming MEDICAL HISTORY: begins a line):
#!/usr/bin/perl use warnings; use strict; open IN, "input.txt" or die; open OUT, ">output.out" or die; my $sHistory = ''; my $bInHistory = 0; while (my $line=<IN>) { if ($line =~ /^MEDICAL HISTORY:(.*)$/) { $bInHistory=1; $sHistory = $1; } elsif ($line =~ /^[A-Z]/) { $bInHistory=0; print OUT $sHistory if $sHistory; } elsif ($bInHistory) { $sHistory .= $line; } } print OUT $sHistory if $bInHistory;

Also it is a very good idea to start your script with the two lines:

use strict; use warnings;
as I did above. You will save yourself a world of debugging pain by doing so.

Another point: the variables $a and $b have special meaning in perl (they are used for sorting algorithms), so it is best to stay away from those variable names as well and name your variables something else.

And another point: always check for errors when you open file handles. Sometimes they don't open like you expect. If you don't check, you'll get strange results without any proper warning.

Best, beth

Update: fixed some bugs (including one pointed out in private msg by almut.)


In reply to Re: content extraction by ELISHEVA
in thread content extraction by zzgulu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.