in reply to Re: multiple OR match fails
in thread multiple OR match fails

Thank you very much for your inputs and sorry for the typo; one parenthesis was missing from the code. My text files are operative notes and each note consists of sections that start with a title at the beginning of a line, all in upper case and end in colon. Sections are usually separated by an empty line, although this may not be always the case. The input directory contains 1000 files and my intention is to write the files back to an output directory but with only designated matched sections (title + content). Per recommendation, it seems adding a while loop to my matching RegEx fixed the issue but please do advise me if you find other issues in the code. I seldom do codes but since I am working with text files the RegEx is very powerful helping me for occasional data extraction.I am sure there are much easier ways to code what I coded below. This is a sample input file:

PREOPERATIVE DIAGNOSIS: Left invasive cancer, positive margins.

TITLE OF OPERATION:

1. Left needle-localized segmental mastectomy.

2. intraoperative axillary lymphatic mapping.

3. lymphadenectomy.

ANESTHESIA: General.

INDICATIONS FOR SURGERY: Invasive carcinoma with positive margins and residual calcifications.

COMPLICATIONS : None.

#!/usr/bin/perl use strict; use warnings; my $indir; my $file; my $new; my $string; my $outdir; $indir = 'C:/input'; $outdir ='C:/output'; if(-d $indir) { opendir(DIR, $indir) or die "can't open $!"; } while ($file=readdir(DIR)) { my $fullpath=$indir.'/'.$file; open IN, "$indir/$file"; $new= "$outdir/$file"; open OUT, ">$new"; while(<IN>) { undef ($/); $string=$_; while ($string =~m/(FINDINGS|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR); exit;

Replies are listed 'Best First'.
Re^3: multiple OR match fails
by Marshall (Canon) on Jan 31, 2012 at 22:41 UTC
    Since you asked for comments, I'll make a few:
    - main improvement is to make better indenting
    - if(-d $indir) was unnecessary
    - when you do a readdir, this returns only the names (not full paths) and this will include any directories (including the . and .. ones!). It is common to use a grep to filter out the stuff that you don't want.
    - always check whether any kind of file operation succeeded or not
    - declare variables when you actually use them the first time.
    I didn't actually run this so excuse me if I made a mistake.
    #!/usr/bin/perl use strict; use warnings; my $indir = 'C:/input'; my $outdir ='C:/output'; opendir(DIR, $indir) or die "can't open directory $indir $!"; foreach my $file (grep{-f "$indir/$_"}readdir DIR) { open IN, '<', "$indir/$file" or die "can't open $indir/$file $!"; my $new= "$outdir/$file"; open OUT, '>', $new or die "can't open $new for output $!"; while (my $string = <IN>) { undef ($/); while ($string =~m/(FINDINGS|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm +) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR);
    update: these "close" statements aren't strictly necessary, all file handles will get closed when your program exists. When you open IN for the next file, this automatically closes the current IN file (if there is one). exit() wasn't necessary, so I took it out.