Extracting a block of text between start and end point

shonurulez has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am really new to Perl and I have multiple files in my log directory , i need to extract the text from these log files which contains a keyword between "PROC SQL" (This is start point) and "QUIT" (This is end point) and in this block only if "AS KEYWORD" is found then only I want to print it to another file. Below is example

 <B> 
     %put %str(NOTE: Mapping columns ...);
NOTE: Mapping columns ...
481        proc sql;
482           create table work.INPUT_ACCT_POOL_3 as
483              select
484                 Acct_Key,
485                 Arrears_Bal_Amt,
486                 ARREARS_DAYS,
531                    end) as DELQ_STATUS length = 8
532                    format = 8.
533                    informat = 8.
534                    label = 'DELQ_STATUS',
535                 Arrears_Start_Date,
536                 PERIODENDING,
537                 PRODUCT_TYPE_CODE
538           from &SYSLAST
SYMBOLGEN:  Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2 
+              
539           ;
NOTE: A CASE expression has no ELSE clause. Cases not accounted for by
+ the WHEN clauses will result in a missing value for the CASE express
+ion.
NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35
+.05 percent. 
      Compressed is 9043 pages; un-compressed would require 13923 page
+s.
NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c
+olumns.

540        quit;
480        %put %str(NOTE: Mapping columns ...);
NOTE: Mapping columns ...
481        proc sql;
482           create table work.INPUT_ACCT_POOL_3 as
483              select
484                 Acct_Key,
485                 Arrears_Bal_Amt,
486                 ARREARS_DAYS,
532                    format = 8.
533                    informat = 8.
534                    label = 'DELQ_STATUS',
535                 Arrears_Start_Date,
536                 PERIODENDING,
537                 PRODUCT_TYPE_CODE
538           from &SYSLAST
SYMBOLGEN:  Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2 
+              
539           ;
NOTE: A CASE expression has no ELSE clause. Cases not accounted for by
+ the WHEN clauses will result in a missing value for the CASE express
+ion.
NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35
+.05 percent. 
      Compressed is 9043 pages; un-compressed would require 13923 page
+s.
NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c
+olumns.

540        quit;
[download]

Above text is present in logfile named log1.log and I want to extract only the first block to file named output.txt because it contains as delq_status since I am passing delq_status as variable Anyhelp in this regard is highly appreciated. Thanks

Comment on Extracting a block of text between start and end point Download Code

Replies are listed 'Best First'.

Re: Extracting a block of text between start and end point
by Athanasius (Archbishop) on Jun 17, 2015 at 06:21 UTC

Hello shonurulez, and welcome to the Monastery!

The Perl range operator (.. in scalar context) is useful for this kind of task:

use strict;
use warnings;

my ($start, $keyword, $end) = ('PROC SQL', 'DELQ_STATUS', 'QUIT');
my  @block;

while (<DATA>)
{
    if (/$start/i .. /$end/i)
    {
        push @block, $_;
    }

    if (/$end/i)
    {
        for (@block)
        {
            if (/AS \s+ $keyword/ix)
            {
                print join('', @block);
                last;
            }
        }

        @block = ();
    }
}

__DATA__
...
[download]

(The contents of file “log1.log” are included immediately following the __DATA__ line; but I omit them here for the sake of brevity.) The output is as follows:

16:19 >perl 1275_SoPW.pl
481        proc sql;
482           create table work.INPUT_ACCT_POOL_3 as
483              select
484                 Acct_Key,
485                 Arrears_Bal_Amt,
486                 ARREARS_DAYS,
531                    end) as DELQ_STATUS length = 8
532                    format = 8.
533                    informat = 8.
534                    label = 'DELQ_STATUS',
535                 Arrears_Start_Date,
536                 PERIODENDING,
537                 PRODUCT_TYPE_CODE
538           from &SYSLAST
SYMBOLGEN:  Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2

539           ;
NOTE: A CASE expression has no ELSE clause. Cases not accounted for by
+ the WHEN clauses will result in a missing value for the CASE express
+ion.
NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35
+.05 percent.
      Compressed is 9043 pages; un-compressed would require 13923 page
+s.
NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c
+olumns.

540        quit;

16:19 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: Extracting a block of text between start and end point

by shonurulez (Initiate) on Jun 17, 2015 at 06:36 UTC

Thanks a lot sir, but then in this case, how do I generalise this for all my files. What i understand is I need to copy paste all files below _data_ wont that be too much of manual work ? sorry for being too naive i really am a beginner here

[reply]

Re^3: Extracting a block of text between start and end point

by Athanasius (Archbishop) on Jun 17, 2015 at 06:50 UTC

My code was just a proof-of-concept to show how to use the range operator for this task. For a working script operating on files “log1.log”, “logX.log”, and “logZ.log”, say, — and assuming you want all the output to go to a single file “output.txt” (do you?) — you would do (untested):

my @filenames = qw(log1.log logX.log logZ.log);

open(my $out, '>', 'output.txt') or die "Cannot open file 'output.txt'
+ for writing: $!";

for (my $filename (@filenames)
{
    open(my $in, '<', $filename) or die "Cannot open file '$filename' 
+for reading: $!";

    while (<$in>)
    {
        ...

            if (/AS \s+ $keyword/ix)
            {
                print $out join('', @block);
                last;
            }

        ...
    }

    close $in or die "Cannot close file '$filename': $!";
}

close $out or die "Cannot close file 'output.txt: $!";
[download]

See perlintro and perlopentut.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]

Re^3: Extracting a block of text between start and end point

by QM (Parson) on Jun 17, 2015 at 07:51 UTC

So instead of opening an output file, and instead of looping over a list of files, do this:

while (<>) {
    do_something_here();
    print $blah if $accepted;
}
[download]

Then you run it like so:

my_script blah*.log > my_output_file
[download]

-QM
--
Quantum Mechanics: The dreams stuff is made of

[reply]
[d/l]
[select]

Re^3: Extracting a block of text between start and end point

by robby_dobby (Hermit) on Jun 17, 2015 at 06:52 UTC

One of the things perl does really well is I/O, that is - it's really good at reading files and extracting data from them. It does it so well that speed is not even a consideration, in most cases. :-)

Like Athanasius said, you can keep of all that log content in their own files and read them from perl code. Then, you can use the Flip-flop operator as shown by Athanasius, reworking them to your needs. For starters, look at open and work from there. :-)

[reply]

Re: Extracting a block of text between start and end point
by sandy105 (Scribe) on Jun 17, 2015 at 08:42 UTC

assuming you would want to process all log files in a directory and extract line lines with keyword ..

$keyword = "your keyword here";

open hanw,"<" , "output.txt" or die "could not open output file $!";

opendir(hand,$dirpath);  #replace with your DIR
    @files = readdir(hand);
    closedir(hand);

    foreach(@files){
        if(/\.log$/i) {                                               
+      #if the filename has .log at the end
            push(@logfiles,$_);  
        }
    }

$nooffiles = @logfiles;
$fileindex=0;
while ($fileindex <$nooffiles )
{
open hanr ,">", "$logfiles[$fileindex]" or die "could not open logfile
+ .. $!";
while (<hanr>) {
if ($_ =~ /$keyword/ ) {
print hanw ;
#last;  --if it will occur only once
}
} 
close hanr;
$fileindex++;
}

close hanw;
[download]

if you are sure it will occur inside a block then use the code block using flip flop operator as suggested above

 if (/$start/i .. /$end/i)
# where start and end are your start and end keywords
[download]

[reply]
[d/l]
[select]

Re: Extracting a block of text between start and end point
by robby_dobby (Hermit) on Jun 17, 2015 at 05:19 UTC

shonurulez

Your block of text is unreadable. Can you put them between <CODE></CODE> blocks or pre-format them? I'm sure someone here in the monastery would be along to guide you.

Welcome to the monastery. Have fun!

[reply]
[d/l]

A reply falls below the community's threshold of quality. You may see it by logging in.