NewMonk2Perl has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning Perl Gurus, I have been trying to figure out all morning but can not understand why the 2nd set of if statements for $rows2 does not capture all data within the R brackets like I intended. Any help would be greatly appreciated! Below is what the file for OUTFILE file handler looks like:

Patient_ID || Med_Hist || Sur_Hist ZZZZZ00DFL || [R] [R]No pertinent past medical history [R] [R] || + [R]UNREMARKABLE [R] [R] ZZZZZ00DFL || [R] [R]No pertinent past medical history [R] [R] || + [R]UNREMARKABLE [R] [R] [R] ZZZZZ00C4H || || ZZZZZ00C4H || [R] [R]High Blood Pressure [R]Kidney stones [R] [R]S +OAPP-R Score : Moderate/High Risk: 19 [R] [R]Sleep Apnea Assessment +(STOP-BANG): ( 2/4 ); 7/2/2014 [R] [R]Orthotic brace tried for pain + relief: Yes: limited benefit [R]TENS Unit tried for pain relief: Ye +s: limited benefit [R]Tried Topical Compound Cream?: Yes: limited be +nefit [R] [R]Acupuncture therapy tried for pain relief: Yes: limited + benefit [R]Chiropractic therapy tried for pain relief: Yes: limited + benefit [R] [R]Physical therapy tried for pain relief: Yes: limited + benefit [R]Massage therapy tried for pain relief: Yes: limited bene +fit \[R\] \[R\] || [R] [R]Hysterectomy [R]Spinal Fusion: L4-5, L5 +-S!; 2010 \[R\] \[R\]

Here is the code:

open (OUTFILE, "< C:\\Scripts\\TEST\\History21.txt") or die "Could not + open source file. $!"; open (NEW_MED, "> C:\\Scripts\\TEST\\History_MED1.txt") or die "Could + not open source file. $!"; open (NEW_SUR, "> C:\\Scripts\\TEST\\History_SUR1.txt") or die "Could + not open source file. $!"; $count2 = 1; while ($line = <OUTFILE>) { @rows = split(/\|\|/, $line); if ($rows[1] =~ m/\[R\]/g) { if ((@var)= $rows[1] =~ m/\](.*?)\[/g) { foreach $var (@var) { if ($var =~ /^\s$/g) { #DO NOTHING SINCE WE ONLY WANT IF THERE IS ANYTHI +NG INSIDE THE 2 R BRACKETS } else { print NEW_MED "$rows[0] || $var\n"; } } } } if ($rows[2] =~ m/\[R\]/g) { if ((@var2)= $rows[2] =~ m/\](.*?)\[/g) { # I AM ASSUMING IT +WILL RETURN SIMILAR TO THE PREVIOUS IF STATEMENT FOR ROW1 ABOVE foreach $var2 (@var2) { if ($var =~ /^\s$/g) { #DO NOTHING SINCE WE ONLY WANT IF THERE IS ANYTHI +NG INSIDE THE 2 R BRACKETS } else { print NEW_SUR "$rows[0] || $var2\n"; } } } } $count2++; } close(NEW_MED); close(NEW_SUR);

The code brings back most of what I want, but it excluded the data from lines 2 and 3 of the file(line 1 is headers) which has "UNREMARKABLE" for "Sur_Hist" column, I need ALL data within the 2 brackets

[R]....[R]

captured unless its just a space like below then I need it excluded

[R].[R]

Replies are listed 'Best First'.
Re: Extracting multiple match with regex
by choroba (Cardinal) on Apr 14, 2016 at 19:02 UTC
    There are several problems in your code. The main problem is the /g on line 21: it's called in scalar context, so it remembers where the search stopped, and the next matching (line 22) starts there. The single [R] before UNREMARKABLE is therefore skipped.

    Your comment rightly indicates the second if is almost the same as the first one. Usually, one would abstract the common behaviour into a subroutine. Using the name OUTPUT for an input handle is a bit confusing, too (and lexical filehandles are safer than barewords). Here's my take:

    #!/usr/bin/perl use warnings; use strict; sub process_column { my ($patient, $column, $fh_out) = @_; if ($column =~ m/\[R\]/) { if (my @vars = $column =~ m/\](.*?)\[/g) { for my $var (@vars) { if ($var !~ /^\s+$/g) { print {$fh_out} "$patient || $var\n"; } } } } } open my $OUTFILE, '<', 'C:/Scripts/TEST/History21.txt' or die "Could not open source file. $!"; open my $NEW_MED, '>', 'C:/Scripts/TEST/History_MED1.txt' or die "Could not open target file. $!"; open my $NEW_SUR, '>', 'C:/Scripts/TEST/History_SUR1.txt' or die "Could not open target file. $!"; my $count = 1; while (my $line = <$OUTFILE>) { my @rows = split /\|\|/, $line; process_column($rows[0], $rows[1], $NEW_MED); process_column($rows[0], $rows[2], $NEW_SUR); $count++; } close $NEW_SUR or die "Cannot close target file: $!"; close $NEW_MED or die "Cannot close target file: $!"; print "COUNT: $count\n";

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Awesome!! Thank you!! Thank you for the explanation and solution.
Re: Extracting multiple match with regex
by hexcoder (Curate) on Apr 14, 2016 at 19:21 UTC
    Hello to the monastery!

    First, in the for loop of the second block I think you should use $var2 instead of $var in the if condition. The Perl compiler told me this after I inserted

    use strict; use warnings;
    at the beginning, which is recommended.

    Second, the g modifier in the second block

    if ($rows[2] =~ m/\[R\]/g) {
    skips the first [R] of the second column. So, it is not available any more for the next match.

    If I change your script to

    use strict; use warnings; open (OUTFILE, "< History21.txt") or die "Could not open source file. +$!"; open (NEW_MED, "> History_MED1.txt") or die "Could not open med file. +$!"; open (NEW_SUR, "> History_SUR1.txt") or die "Could not open sur file. +$!"; my $count2 = 1; while (my $line = <OUTFILE>) { my @rows = split(/\|\|/, $line); if ($rows[1] =~ m/\[R\]/g) { if ((my @var)= $rows[1] =~ m/\](.*?)\[/g) { foreach my $var (@var) { if ($var =~ /^\s$/g) { #DO NOTHING SINCE WE ONLY WANT IF THERE IS ANYTHI +NG INSIDE THE 2 R BRACKETS } else { print NEW_MED "$rows[0] || $var\n"; } } } } if ($rows[2] =~ m/\[R\]/) { if ((my @var2)= $rows[2] =~ m/\](.*?)\[/g) { # I AM ASSUMING +IT WILL RETURN SIMILAR TO THE PREVIOUS IF STATEMENT FOR ROW1 ABOVE foreach my $var2 (@var2) { if ($var2 =~ /^\s$/g) { #DO NOTHING SINCE WE ONLY WANT IF THERE IS ANYTHI +NG INSIDE THE 2 R BRACKETS } else { print NEW_SUR "$rows[0] || $var2\n"; } } } } $count2++; } close(NEW_MED); close(NEW_SUR);
    I get the missing output.
    ZZZZZ00DFL || No pertinent past medical history ZZZZZ00DFL || No pertinent past medical history ZZZZZ00C4H || High Blood Pressure ZZZZZ00C4H || Kidney stones ZZZZZ00C4H || SOAPP-R Score : Moderate/High Risk: 19 ZZZZZ00C4H || Sleep Apnea Assessment(STOP-BANG): ( 2/4 ); 7/2/2014 ZZZZZ00C4H || Orthotic brace tried for pain relief: Yes: limited bene +fit ZZZZZ00C4H || TENS Unit tried for pain relief: Yes: limited benefit ZZZZZ00C4H || Tried Topical Compound Cream?: Yes: limited benefit ZZZZZ00C4H || Acupuncture therapy tried for pain relief: Yes: limited + benefit ZZZZZ00C4H || Chiropractic therapy tried for pain relief: Yes: limite +d benefit ZZZZZ00C4H || Physical therapy tried for pain relief: Yes: limited be +nefit ZZZZZ00C4H || Massage therapy tried for pain relief: Yes: limited ben +efit \ ZZZZZ00C4H || \
    and
    ZZZZZ00DFL || UNREMARKABLE ZZZZZ00DFL || UNREMARKABLE ZZZZZ00C4H || Hysterectomy ZZZZZ00C4H || Spinal Fusion: L4-5, L5-S!; 2010 \ ZZZZZ00C4H || \
      Thank you! Yea I didnt know that the /g modify did that, I love this place, thanks to all the perl gurus in here for helping us newbies! Cheers!