in reply to Regex Not Grabbing Everything

I find your code very hard to understand. I would spend more time on the indenting style. Many Monks love the old style indenting. I prefer the newer style for new code although I will "go with the flow for old code". You are writing new code, so I would go with the newer indenting style. Both are "correct", but whatever style you choose (old vs new), do it right according to that style.

Also, when you present code that doesn't do what you want, the more clearly you explain what it should be doing the better!

When using the 2 or 3 dot operator, keep things simple and finish capturing the complete record, then process it - don't try to "back up" in the middle of the if statement. Perhaps setting a flag "hey this record is of interest" would be fine. My point is: Do more complex things only if there is a performance reason. The first objective should be simplicity and clarity.

Usually some combination of regex and split is going to work out to be more flexible, easier to write and easier to understand and maintain than using substr(). Substr will be the fastest, but that does not necessarily mean "best". I've got code with a solid 1-2 pages of substr but I needed it for the performance.

Below, I used a regex that looks for lines with some number at the beginning and 0.00 at the end. The number at the beginning could be some huge number like what you have although I didn't see the need. Adjust to your requirements. Note that "space characters" include \t\b\r\n\s so there is no need to "chomp" the line.

As another piece of unsolicited advice..try to write code that is "flat", meaning that: fewer levels of indention == better. Think about how to reformulate things when you get the 4th level of indentation.

#!/usr/bin/perl -w use strict; my @data=(); while (<DATA>) { if (my $flag_EOR = /NAME /.../ADJ TO TOTALS:/) { push (@data, $_); #accumulates this record's data # add print "$flag_EOR\n"; to see what is happening... next unless $flag_EOR =~ /E0$/; } #print header/trailer and only the zero lines if (my @lines = grep{/^\d+.*\s*0\.00\s*$/}@data) { print $data[0]; # header of record print @lines; # lines that start with numbers and # end with 0.00 print $data[-1]; # trailer of record } @data=(); } =prints I manually chopped lines down to prevent word wrap NAME DOE, JOHN HIC 1111111111 ...blah... 12351141821118 111809 23 001 71010 ... 0.00 CO-18 31.00 0.00 12351141821118 111809 23 001 74150 ... 0.00 CO-18 199.00 0.00 12351141821118 111809 23 001 72192 ... 0.00 CO-18 182.00 0.00 ADJ TO TOTALS: PREV PD INTEREST 0.00 LATE FILING CHARGE 0.00 NET + 84.25 =cut __DATA__ REND PROV SERV DATE POS NOS PROC MODS BILLED ALLOWED + DEDUCT COINS GRP/RC AMT PROV PD ______________________________________________________________________ +__________________________________________________________ NAME DOE, JOHN HIC 1111111111 ACNT 1111111 + ICN 1111111111111 ASG Y MOA MA01 MA18 12351141821118 111809 23 001 71010 26 31.00 0.00 + 0.00 0.00 CO-18 31.00 0.00 + N347 12351141821118 111809 23 001 70450 26 142.00 44.70 + 0.00 8.94 OA-45 97.30 35.76 + N265 + PR-2 8.94 12351141821118 111809 23 001 74150 26 199.00 0.00 + 0.00 0.00 CO-18 199.00 0.00 + N347 12351141821118 111809 23 001 72192 26 182.00 0.00 + 0.00 0.00 CO-18 182.00 0.00 + N347 12351141821118 111809 23 001 72131 26 195.00 60.61 + 0.00 12.12 OA-45 134.39 48.49 + N265 + PR-2 12.12 PT RESP 21.06 CLAIM TOTALS 749.00 105.31 + 0.00 21.06 643.69 84.25 ADJ TO TOTALS: PREV PD INTEREST 0.00 LATE +FILING CHARGE 0.00 NET 84.25 CLAIM INFORMATION FORWARDED TO : XXXXXX XXXXXXXX INSURANCE CO

Replies are listed 'Best First'.
Re^2: Regex Not Grabbing Everything
by JonDepp (Novice) on Sep 20, 2010 at 18:21 UTC

    Thanks so much for your reply and explanation. My style and confusing code stems from the fact that I started teaching myself Perl a few months ago.

    Your code works great. The only problem I have is that it doesn't grab those N347 and N265 codes that are in between the lines. That is why I wanted to print the whole of NAME...ADJ TO TOTALS because it included those codes. Is there a way to include grabbing those N codes as part of the @lines array? I tried to grep those codes and add them to the print statements but they don't line up correctly (the lines they correspond to are directly above the code lines). That is why I thought of using the substring in order to test for the 0.00 condition and grab all the information. Do you think I need to add another if conditional or modify the @lines array to grab those N codes in the right spot?

      To get the next line after a line ending in 0.00,
      if (my @lines = grep{/^\d+.*\s*0\.00\s*$/}@data) change to: if (my @lines = grep{/^\d+.*\s*0\.00\s*$/..././}@data)
      Perl grep is a very powerful critter! What the above says is to filter lines from @data. If the condition in the grep is true, make a copy of the line in @lines. The regex in the grep means: True if we are inbetween a line starts with a digit and ends with 0.00 and another line containing any character at all. The 3 dots means that this "any character" has to be on a separate line. So this will print the line ending in 0.00 and the next line whatever it has, which from your data format happens to be this N347 stuff! Pretty cool!

      As like before, I deleted some of the characters so that the output wouldn't word wrap.

      NAME DOE, JOHN ..... ASG Y MOA MA01 MA18 12351141821118 ..... CO-18 31.00 0.00 N347 12351141821118 ..... CO-18 199.00 0.00 N347 12351141821118 111809 CO-18 182.00 0.00 N347 ADJ TO TOTALS: PREV PD INTEREST ... 84.25
      I wish you well on your Perl learning adventure! Perl is certainly not considered a beginning language. So you are starting in a hard place. You will need a number of books, perhaps start with "Learning Perl".