Re: Regex Not Grabbing Everything

I find your code very hard to understand. I would spend more time on the indenting style. Many Monks love the old style indenting. I prefer the newer style for new code although I will "go with the flow for old code". You are writing new code, so I would go with the newer indenting style. Both are "correct", but whatever style you choose (old vs new), do it right according to that style.

Also, when you present code that doesn't do what you want, the more clearly you explain what it should be doing the better!

When using the 2 or 3 dot operator, keep things simple and finish capturing the complete record, then process it - don't try to "back up" in the middle of the if statement. Perhaps setting a flag "hey this record is of interest" would be fine. My point is: Do more complex things only if there is a performance reason. The first objective should be simplicity and clarity.

Usually some combination of regex and split is going to work out to be more flexible, easier to write and easier to understand and maintain than using substr(). Substr will be the fastest, but that does not necessarily mean "best". I've got code with a solid 1-2 pages of substr but I needed it for the performance.

Below, I used a regex that looks for lines with some number at the beginning and 0.00 at the end. The number at the beginning could be some huge number like what you have although I didn't see the need. Adjust to your requirements. Note that "space characters" include \t\b\r\n\s so there is no need to "chomp" the line.

As another piece of unsolicited advice..try to write code that is "flat", meaning that: fewer levels of indention == better. Think about how to reformulate things when you get the 4th level of indentation.

#!/usr/bin/perl -w
use strict;

my @data=();
while (<DATA>)
{
   if (my $flag_EOR = /NAME /.../ADJ TO TOTALS:/) 
   {
       push (@data, $_);            #accumulates this record's data
       # add print "$flag_EOR\n"; to see what is happening...
       next unless $flag_EOR =~ /E0$/;
   }
   
   #print header/trailer and only the zero lines
   
   if (my @lines = grep{/^\d+.*\s*0\.00\s*$/}@data)
   {
       print $data[0];  # header of record
       print @lines;    # lines that start with numbers and 
                        # end with 0.00
       print $data[-1]; # trailer of record
   }
   
   @data=();
}

=prints I manually chopped lines down to prevent word wrap
NAME DOE, JOHN                       HIC 1111111111   ...blah...   
12351141821118 111809    23 001 71010 ... 0.00 CO-18   31.00 0.00
12351141821118 111809    23 001 74150 ... 0.00 CO-18  199.00 0.00
12351141821118 111809    23 001 72192 ... 0.00 CO-18  182.00 0.00
ADJ TO TOTALS: PREV PD  INTEREST  0.00 LATE FILING CHARGE 0.00    NET 
+      84.25
=cut


__DATA__
REND PROV    SERV DATE  POS NOS PROC  MODS      BILLED     ALLOWED    
+  DEDUCT       COINS GRP/RC           AMT     PROV PD
______________________________________________________________________
+__________________________________________________________
NAME DOE, JOHN                       HIC 1111111111     ACNT 1111111  
+        ICN 1111111111111    ASG Y MOA MA01  MA18 
12351141821118 111809    23 001 71010 26         31.00        0.00    
+    0.00        0.00 CO-18          31.00        0.00
                                                                      
+                     N347  
12351141821118 111809    23 001 70450 26        142.00       44.70    
+    0.00        8.94 OA-45          97.30       35.76
                                                                      
+                     N265  
                                                                      
+                     PR-2            8.94
12351141821118 111809    23 001 74150 26        199.00        0.00    
+    0.00        0.00 CO-18         199.00        0.00
                                                                      
+                     N347  
12351141821118 111809    23 001 72192 26        182.00        0.00    
+    0.00        0.00 CO-18         182.00        0.00
                                                                      
+                     N347  
12351141821118 111809    23 001 72131 26        195.00       60.61    
+    0.00       12.12 OA-45         134.39       48.49
                                                                      
+                     N265  
                                                                      
+                     PR-2           12.12
PT RESP        21.06          CLAIM TOTALS      749.00      105.31    
+    0.00       21.06               643.69       84.25
ADJ TO TOTALS: PREV PD                INTEREST        0.00       LATE 
+FILING CHARGE        0.00    NET       84.25
CLAIM INFORMATION FORWARDED TO : XXXXXX XXXXXXXX INSURANCE CO
[download]

Comment on Re: Regex Not Grabbing Everything Download Code

Replies are listed 'Best First'.
Re^2: Regex Not Grabbing Everything by JonDepp (Novice) on Sep 20, 2010 at 18:21 UTC
Thanks so much for your reply and explanation. My style and confusing code stems from the fact that I started teaching myself Perl a few months ago. Your code works great. The only problem I have is that it doesn't grab those N347 and N265 codes that are in between the lines. That is why I wanted to print the whole of NAME...ADJ TO TOTALS because it included those codes. Is there a way to include grabbing those N codes as part of the @lines array? I tried to grep those codes and add them to the print statements but they don't line up correctly (the lines they correspond to are directly above the code lines). That is why I thought of using the substring in order to test for the 0.00 condition and grab all the information. Do you think I need to add another if conditional or modify the @lines array to grab those N codes in the right spot?	[reply]
Re^3: Regex Not Grabbing Everything by Marshall (Canon) on Sep 20, 2010 at 20:29 UTC
To get the next line after a line ending in 0.00, `if (my @lines = grep{/^\d+.\s0\.00\s$/}@data) change to: if (my @lines = grep{/^\d+.\s0\.00\s$/..././}@data)` [download] Perl grep is a very powerful critter! What the above says is to filter lines from @data. If the condition in the grep is true, make a copy of the line in @lines. The regex in the grep means: True if we are inbetween a line starts with a digit and ends with 0.00 and another line containing any character at all. The 3 dots means that this "any character" has to be on a separate line. So this will print the line ending in 0.00 and the next line whatever it has, which from your data format happens to be this N347 stuff! Pretty cool! As like before, I deleted some of the characters so that the output wouldn't word wrap. `NAME DOE, JOHN ..... ASG Y MOA MA01 MA18 12351141821118 ..... CO-18 31.00 0.00 N347 12351141821118 ..... CO-18 199.00 0.00 N347 12351141821118 111809 CO-18 182.00 0.00 N347 ADJ TO TOTALS: PREV PD INTEREST ... 84.25` [download] I wish you well on your Perl learning adventure! Perl is certainly not considered a beginning language. So you are starting in a hard place. You will need a number of books, perhaps start with "Learning Perl".	[reply] [d/l] [select]