in reply to Text Extraction

There is probably a clever way to do it with grep and Range Operators, but here is a way using state variables. You should change the flags to more meaningful names for your application:
use strict; use warnings; my $flag1 = 0; my $flag2 = 0; my @lines; while (<DATA>) { $flag2 = 1 if $flag1 and /A1/; $flag1 = 1 if /SUBSCRIBER/; push @lines, $_ if $flag1; if (/NATIONAL/) { print @lines if $flag2; $flag1 = 0; $flag2 = 0; @lines = (); } } __DATA__ foo bar SUBSCRIBER goo hoo nada NATIONAL SUBSCRIBER goo A1 NATIONAL junk junk junk
Prints:
SUBSCRIBER goo A1 NATIONAL

Update: Ok, here's my solution with Range Operators:

use strict; use warnings; my $flag = 0; my @lines; while (<DATA>) { if (/SUBSCRIBER/ .. /NATIONAL/) { push @lines, $_; $flag = 1 if /A1/; if (/NATIONAL/) { print @lines if $flag; $flag = 0; @lines = (); } } }

Replies are listed 'Best First'.
Re^2: Text Extraction
by JonDepp (Novice) on Feb 08, 2010 at 19:16 UTC

    Here is an example of my input file and these follow the same structure over and over:

    SUBSCRIBER DEMOGRAPHIC INFORMATION

    BIRTHDATE GENDER MEMBER IDENTIFICATION NUMBER

    NAME

    XXXXXXXXX

    TRACE NUMBER: XXXXXXXX

    CLAIM CLAIM PAYORS CLAIM NUMBER: XXXXXXXXXXXXXXXXXXXX

    PERIOD BEG PERIOD END MEDICAL RECORD NUMBER: 01/14/2010 01/14/2010 BILLING TYPE:

    EFFECTIVE ADJUDICATION PAYMENT CHARGE PAYMENT CHECK STATUS DATE PAYMENT DATE METHOD AMOUNT AMOUNT CHECK DATE NUMBER 02/01/2010 XX.XX 0.00

    CLAIM LEVEL STATUS CATEGORY: A1 STATUS: 19

    MODIFIER: PR PAGE: 11 CLINIC # XXXXXX (C980 ) XXXXXX REPORT NO: CPR601.01 SOMEINSURANCE HEALTH CARE CLAIM STATUS NOTIFICATION ISA CONTROL NO: XXXXXXXXXX ISA PROCESS DATE: 10/02/02 ISA PROCESS TIME: 04:52 GROUP CONTROL NO: XXXXXX ST CONTROL NO: XXXXXXXXX BHT REFERENCE ID: XXXXXXXXX BHT DATE: 02/02/2010 PAYOR NAME: SOMEINSURANCE ID: XXXXX PROVIDER NAME: XXXXXXXXXXX XXXXXXXXX XXXXX XX

    NATIONAL PROVIDER ID: XXXXXXXXXX

    I need everythin between SUBSCRIBER DEMOGRAPHIC - NATIONAL PROVIDER ID only if the CLAIM STATUS CATEGORY CODE is other than A1 (A3, A4, F2...there's a bunch).

    Here is the code I have so far.

    use strict; use warnings; open TEST, "tests.txt" or die $!; open OUTPUT, "> output1.txt" or die$!; my @data; my $data; while (<TEST>) { if (/SUBSCRIBER DEMOGRAPHIC/../CLAIM LEVEL STATUS CATEGORY/) { @data = $_; next; foreach ( $data, @data) { if ($data =~ /A1/) { print OUTPUT @data; } } } } close TEST; close OUTPUT;

    This code gets me no errors in syntax when I run it but I get 0 KB output file. Please Help!!

Re^2: Text Extraction
by JonDepp (Novice) on Feb 05, 2010 at 16:44 UTC

    That worked a lot better. I just realized that the text file I'm parsing has those regular expressions occurring all over the place so I have to refine the ones in my code. This is a great start and I'm sure I'll be back once I refine those expressions. Thanks for all the help!!