shonurulez has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am really new to Perl and I have multiple files in my log directory , i need to extract the text from these log files which contains a keyword between "PROC SQL" (This is start point) and "QUIT" (This is end point) and in this block only if "AS KEYWORD" is found then only I want to print it to another file. Below is example
<B> %put %str(NOTE: Mapping columns ...); NOTE: Mapping columns ... 481 proc sql; 482 create table work.INPUT_ACCT_POOL_3 as 483 select 484 Acct_Key, 485 Arrears_Bal_Amt, 486 ARREARS_DAYS, 531 end) as DELQ_STATUS length = 8 532 format = 8. 533 informat = 8. 534 label = 'DELQ_STATUS', 535 Arrears_Start_Date, 536 PERIODENDING, 537 PRODUCT_TYPE_CODE 538 from &SYSLAST SYMBOLGEN: Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2 + 539 ; NOTE: A CASE expression has no ELSE clause. Cases not accounted for by + the WHEN clauses will result in a missing value for the CASE express +ion. NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35 +.05 percent. Compressed is 9043 pages; un-compressed would require 13923 page +s. NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c +olumns. 540 quit; 480 %put %str(NOTE: Mapping columns ...); NOTE: Mapping columns ... 481 proc sql; 482 create table work.INPUT_ACCT_POOL_3 as 483 select 484 Acct_Key, 485 Arrears_Bal_Amt, 486 ARREARS_DAYS, 532 format = 8. 533 informat = 8. 534 label = 'DELQ_STATUS', 535 Arrears_Start_Date, 536 PERIODENDING, 537 PRODUCT_TYPE_CODE 538 from &SYSLAST SYMBOLGEN: Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2 + 539 ; NOTE: A CASE expression has no ELSE clause. Cases not accounted for by + the WHEN clauses will result in a missing value for the CASE express +ion. NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35 +.05 percent. Compressed is 9043 pages; un-compressed would require 13923 page +s. NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c +olumns. 540 quit;
Above text is present in logfile named log1.log and I want to extract only the first block to file named output.txt because it contains as delq_status since I am passing delq_status as variable Anyhelp in this regard is highly appreciated. Thanks

Replies are listed 'Best First'.
Re: Extracting a block of text between start and end point
by Athanasius (Archbishop) on Jun 17, 2015 at 06:21 UTC

    Hello shonurulez, and welcome to the Monastery!

    The Perl range operator (.. in scalar context) is useful for this kind of task:

    use strict; use warnings; my ($start, $keyword, $end) = ('PROC SQL', 'DELQ_STATUS', 'QUIT'); my @block; while (<DATA>) { if (/$start/i .. /$end/i) { push @block, $_; } if (/$end/i) { for (@block) { if (/AS \s+ $keyword/ix) { print join('', @block); last; } } @block = (); } } __DATA__ ...

    (The contents of file “log1.log” are included immediately following the __DATA__ line; but I omit them here for the sake of brevity.) The output is as follows:

    16:19 >perl 1275_SoPW.pl 481 proc sql; 482 create table work.INPUT_ACCT_POOL_3 as 483 select 484 Acct_Key, 485 Arrears_Bal_Amt, 486 ARREARS_DAYS, 531 end) as DELQ_STATUS length = 8 532 format = 8. 533 informat = 8. 534 label = 'DELQ_STATUS', 535 Arrears_Start_Date, 536 PERIODENDING, 537 PRODUCT_TYPE_CODE 538 from &SYSLAST SYMBOLGEN: Macro variable SYSLAST resolves to WORK.INPUT_ACCT_POOL_2 539 ; NOTE: A CASE expression has no ELSE clause. Cases not accounted for by + the WHEN clauses will result in a missing value for the CASE express +ion. NOTE: Compressing data set WORK.INPUT_ACCT_POOL_3 decreased size by 35 +.05 percent. Compressed is 9043 pages; un-compressed would require 13923 page +s. NOTE: Table WORK.INPUT_ACCT_POOL_3 created, with 5986698 rows and 17 c +olumns. 540 quit; 16:19 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks a lot sir, but then in this case, how do I generalise this for all my files. What i understand is I need to copy paste all files below _data_ wont that be too much of manual work ? sorry for being too naive i really am a beginner here

        My code was just a proof-of-concept to show how to use the range operator for this task. For a working script operating on files “log1.log”, “logX.log”, and “logZ.log”, say, — and assuming you want all the output to go to a single file “output.txt” (do you?) — you would do (untested):

        my @filenames = qw(log1.log logX.log logZ.log); open(my $out, '>', 'output.txt') or die "Cannot open file 'output.txt' + for writing: $!"; for (my $filename (@filenames) { open(my $in, '<', $filename) or die "Cannot open file '$filename' +for reading: $!"; while (<$in>) { ... if (/AS \s+ $keyword/ix) { print $out join('', @block); last; } ... } close $in or die "Cannot close file '$filename': $!"; } close $out or die "Cannot close file 'output.txt: $!";

        See perlintro and perlopentut.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        I normally write my IO scripts to take files on the command line, and output to STDOUT. (That makes the script a filter, and all kinds of wonderful proceed from there.)

        So instead of opening an output file, and instead of looping over a list of files, do this:

        while (<>) { do_something_here(); print $blah if $accepted; }

        Then you run it like so:

        my_script blah*.log > my_output_file

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        One of the things perl does really well is I/O, that is - it's really good at reading files and extracting data from them. It does it so well that speed is not even a consideration, in most cases. :-)

        Like Athanasius said, you can keep of all that log content in their own files and read them from perl code. Then, you can use the Flip-flop operator as shown by Athanasius, reworking them to your needs. For starters, look at open and work from there. :-)

Re: Extracting a block of text between start and end point
by sandy105 (Scribe) on Jun 17, 2015 at 08:42 UTC

    assuming you would want to process all log files in a directory and extract line lines with keyword ..

    $keyword = "your keyword here"; open hanw,"<" , "output.txt" or die "could not open output file $!"; opendir(hand,$dirpath); #replace with your DIR @files = readdir(hand); closedir(hand); foreach(@files){ if(/\.log$/i) { + #if the filename has .log at the end push(@logfiles,$_); } } $nooffiles = @logfiles; $fileindex=0; while ($fileindex <$nooffiles ) { open hanr ,">", "$logfiles[$fileindex]" or die "could not open logfile + .. $!"; while (<hanr>) { if ($_ =~ /$keyword/ ) { print hanw ; #last; --if it will occur only once } } close hanr; $fileindex++; } close hanw;

    if you are sure it will occur inside a block then use the code block using flip flop operator as suggested above

    if (/$start/i .. /$end/i) # where start and end are your start and end keywords
Re: Extracting a block of text between start and end point
by robby_dobby (Hermit) on Jun 17, 2015 at 05:19 UTC
    Hello shonurulez,

    Your block of text is unreadable. Can you put them between <CODE></CODE> blocks or pre-format them? I'm sure someone here in the monastery would be along to guide you.

    Welcome to the monastery. Have fun!

A reply falls below the community's threshold of quality. You may see it by logging in.