Extract ranges of lines from a file, saving each range to a separate file

perlato has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that looks like this

     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************5845
DATE 01-02-16    TIME 05:45:52
 05:46:26 GENAC 1 : ARQC
EXTERNAL AUTHENTICATE: NO ARPC
 05:46:30 GENAC 2 : AAC
 01 FEB 2016     05:47:41      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
INVALID TRANSCATION ON TERMINAL.
-----------------------
[020t 05:47:05 CARD TAKEN
[020t 05:47:07 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*742*02/01/2016*05:47*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************2584
DATE 01-02-16    TIME 05:47:27
 05:48:00 GENAC 1 : ARQC
 05:48:05 GENAC 2 : TC
[020t 05:48:16 CARD TAKEN
[020t 05:48:22 NOTES PRESENTED 0,0,2,0
 01 FEB 2016     05:48:52      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
AMT   GHC40.00
[020t 05:48:31 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*743*02/01/2016*05:57*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************3862
DATE 01-02-16    TIME 05:57:28
 01 FEB 2016     05:58:33      10160021

     INQUIRY
     FROM XXXXXXXX90018
AVAIL          GHC1260.20  
LEDGER         GHC1260.20  
[020t 05:58:06 CARD TAKEN
[020t 05:58:11 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*744*02/01/2016*06:43*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************1972
DATE 01-02-16    TIME 06:43:53
 01 FEB 2016     06:44:56      10160021
5029110111271972
4490    4490
     INQUIRY
     FROM XXXXXXXX23013
AVAIL          GHC14.28
LEDGER         GHC14.28
[020t 06:44:25 CARD TAKEN
[020t 06:44:29 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*745*02/01/2016*06:56*[/CODE]

and need to extract what is between *TRANSACTION STARTED* and TRANSACT
+ION END, ignoring all other information, and create a new file for ea
+ch range.
The new file will contain only 

<CODE>
[020t CARD INSERTED
    [020tCARD: *************2584
    DATE 01-02-16    TIME 05:47:27
     05:48:00 GENAC 1 : ARQC
     05:48:05 GENAC 2 : TC
    [020t 05:48:16 CARD TAKEN
    [020t 05:48:22 NOTES PRESENTED 0,0,2,0
     01 FEB 2016     05:48:52      10160021

         WITHDRAW
         FROM XXXXXXXX    ?
    AMT   GHC40.00
    [020t 05:48:31
[download]

this is What I have but it is not working it dose not save the files in the folder

#! /usr/bin/perl/ 

use warnings;
use strict;

#file to open
my $somefile = "/home/lord-ivan/Soures_Code/Perl/projects/Data/EJDATA.
+LOG";

open (my $fh, '<:encoding(UTF-8)', $somefile) or die "Could not open f
+ile '$somefile' $!"; 
print "$somefile open";

#Extract ranges of lines from a file

while (<$fh>) {
    
    if (/TRANSACTION STARTED/ .. /TRANSACTION END/) 
    {
         next if /TRANSACTION\s*(STARTED|END)/;
        print $_;   
    }
}


my $ofh;

my $outputfile = "/home/lord-ivan/Soures_Code/Perl/projects/EJ Transpo
+rt/Queue/.txt";

BEGIN {$outputfile= "EJ"};

open ($fh, ">>${$outputfile}.txt", print $fh $_);

close($outputfile);
$outputfile++;
[download]

Comment on Extract ranges of lines from a file, saving each range to a separate file Select or Download Code

Replies are listed 'Best First'.

Re: Extract ranges of lines from a file, saving each range to a separate file
by Athanasius (Archbishop) on Feb 24, 2016 at 08:06 UTC

Hello perlato,

First, you need to open the output file before the while loop. Second, you need to change:

print $_;
[download]

to:

print $ofh $_;
[download]

Note: Do not put a comma between the output filehandle and the (first) item to be printed. See print.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: Extract ranges of lines from a file, saving each range to a separate file
by kcott (Archbishop) on Feb 24, 2016 at 11:06 UTC

G'day perlato,

Welcome to the Monastery.

It looks like you were doing fine up to the flip-flop conditional (if (/TRANSACTION STARTED/ .. /TRANSACTION END/) {...}) and then got a bit lost.

You can do all the remaining processing within that if block. Here you'll want to do one of three things:

If /TRANSACTION STARTED/ is TRUE, open a new file for writing. (Don't output the line.)
If /TRANSACTION END/ is TRUE, close the filehandle. (Don't output the line.)
Output all lines that don't match either condition in 1 or 2.

The coding required is very straightforward:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $filename_prefix = 'pm_1155986_out_';
my $filename_suffix = '.txt';
my $filename_number = 0; 
my $out_fh;
my ($start_re, $end_re) = (qr{TRANSACTION STARTED}, qr{TRANSACTION END
+});

open my $in_fh, '<', 'pm_1155986_in.txt';

while (<$in_fh>) {
    if (/$start_re/ .. /$end_re/) {
        if (/$start_re/) {
            open $out_fh, '>',
                $filename_prefix . $filename_number++ . $filename_suff
+ix;
            next;
        }

        if (/$end_re/) {
            close $out_fh;
            next;
        }

        print $out_fh $_;
    }
}
[download]

[Note I've used the autodie pragma. This avoids having to hand-craft ... or die "..." messages for all the I/O operations: a tedious and error-prone activity (which Perl will do for you if you ask it nicely).]

Here's all the input and output data (within the spoiler):

— Ken

[reply]
[d/l]
[select]

Re^2: Extract ranges of lines from a file, saving each range to a separate file

by RonW (Parson) on Feb 27, 2016 at 00:47 UTC

Just FYI, FWIW, the .. operator has a couple of features that can replace the duplicated regex matching.

First, the value of .. isn't just FALSE or TRUE, it's also a line number relative to the start of the range. Before the start, the value is 0 (aka FALSE). When the start of the range is matched, the value is 1. this number increments until the end of the range. So, you can:

my $rln = /$start_re/ .. /$end_re/;
if $rln == 1
{
    # open output file
    next;
}
if $rln > 1
{
   print $out_fh $_;
}
[download]

Second, when the range ends, the number has 'E0' appended. So, you can:

if rindex($rln, 'E0')
{
    close $out_fh;
    next;
}
[download]

rindex is a simple string search that works backwards, so has much less overhead than another regex match. And appending 'E0' to a string of digits is still a valid number - numerically equal to the number without the 'E0'.

[reply]
[d/l]
[select]

Re: Extract ranges of lines from a file, saving each range to a separate file
by Discipulus (Canon) on Feb 24, 2016 at 08:09 UTC

perlato

your definition of $outputfile is wrong: it is a directory, impossible to write there, then you are never writing to $ofh anyway. In addition where are you creating a different outfile for each transaction?

Try to describe the correct procedure in english an then try to translate to Perl.

Doing everything during the while loop can be tricky for your Perl level (maybe), so try to semplify the process: put your distinct part of interest into a datastructure you feel comfortable with (array? an hash? arrayOFarras?).

Then cycle the datastructure and for each element decide a filename, open it, write to it what you need, close it and pass to next element.

It not so hard, I think, but post the code in case of problems.

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

[reply]
[d/l]
[select]