perlato has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that looks like this
*TRANSACTION STARTED* [020t CARD INSERTED [020tCARD: *************5845 DATE 01-02-16 TIME 05:45:52 05:46:26 GENAC 1 : ARQC EXTERNAL AUTHENTICATE: NO ARPC 05:46:30 GENAC 2 : AAC 01 FEB 2016 05:47:41 10160021 WITHDRAW FROM XXXXXXXX ? INVALID TRANSCATION ON TERMINAL. ----------------------- [020t 05:47:05 CARD TAKEN [020t 05:47:07 TRANSACTION END [0r(1)2[000p[040qe1w3h162[020t*742*02/01/2016*05:47* *TRANSACTION STARTED* [020t CARD INSERTED [020tCARD: *************2584 DATE 01-02-16 TIME 05:47:27 05:48:00 GENAC 1 : ARQC 05:48:05 GENAC 2 : TC [020t 05:48:16 CARD TAKEN [020t 05:48:22 NOTES PRESENTED 0,0,2,0 01 FEB 2016 05:48:52 10160021 WITHDRAW FROM XXXXXXXX ? AMT GHC40.00 [020t 05:48:31 TRANSACTION END [0r(1)2[000p[040qe1w3h162[020t*743*02/01/2016*05:57* *TRANSACTION STARTED* [020t CARD INSERTED [020tCARD: *************3862 DATE 01-02-16 TIME 05:57:28 01 FEB 2016 05:58:33 10160021 INQUIRY FROM XXXXXXXX90018 AVAIL GHC1260.20 LEDGER GHC1260.20 [020t 05:58:06 CARD TAKEN [020t 05:58:11 TRANSACTION END [0r(1)2[000p[040qe1w3h162[020t*744*02/01/2016*06:43* *TRANSACTION STARTED* [020t CARD INSERTED [020tCARD: *************1972 DATE 01-02-16 TIME 06:43:53 01 FEB 2016 06:44:56 10160021 5029110111271972 4490 4490 INQUIRY FROM XXXXXXXX23013 AVAIL GHC14.28 LEDGER GHC14.28 [020t 06:44:25 CARD TAKEN [020t 06:44:29 TRANSACTION END [0r(1)2[000p[040qe1w3h162[020t*745*02/01/2016*06:56*[/CODE] and need to extract what is between *TRANSACTION STARTED* and TRANSACT +ION END, ignoring all other information, and create a new file for ea +ch range. The new file will contain only <CODE> [020t CARD INSERTED [020tCARD: *************2584 DATE 01-02-16 TIME 05:47:27 05:48:00 GENAC 1 : ARQC 05:48:05 GENAC 2 : TC [020t 05:48:16 CARD TAKEN [020t 05:48:22 NOTES PRESENTED 0,0,2,0 01 FEB 2016 05:48:52 10160021 WITHDRAW FROM XXXXXXXX ? AMT GHC40.00 [020t 05:48:31
this is What I have but it is not working it dose not save the files in the folder
#! /usr/bin/perl/ use warnings; use strict; #file to open my $somefile = "/home/lord-ivan/Soures_Code/Perl/projects/Data/EJDATA. +LOG"; open (my $fh, '<:encoding(UTF-8)', $somefile) or die "Could not open f +ile '$somefile' $!"; print "$somefile open"; #Extract ranges of lines from a file while (<$fh>) { if (/TRANSACTION STARTED/ .. /TRANSACTION END/) { next if /TRANSACTION\s*(STARTED|END)/; print $_; } } my $ofh; my $outputfile = "/home/lord-ivan/Soures_Code/Perl/projects/EJ Transpo +rt/Queue/.txt"; BEGIN {$outputfile= "EJ"}; open ($fh, ">>${$outputfile}.txt", print $fh $_); close($outputfile); $outputfile++;

Replies are listed 'Best First'.
Re: Extract ranges of lines from a file, saving each range to a separate file
by Athanasius (Archbishop) on Feb 24, 2016 at 08:06 UTC
Re: Extract ranges of lines from a file, saving each range to a separate file
by kcott (Archbishop) on Feb 24, 2016 at 11:06 UTC

    G'day perlato,

    Welcome to the Monastery.

    It looks like you were doing fine up to the flip-flop conditional (if (/TRANSACTION STARTED/ .. /TRANSACTION END/) {...}) and then got a bit lost.

    You can do all the remaining processing within that if block. Here you'll want to do one of three things:

    1. If /TRANSACTION STARTED/ is TRUE, open a new file for writing. (Don't output the line.)
    2. If /TRANSACTION END/ is TRUE, close the filehandle. (Don't output the line.)
    3. Output all lines that don't match either condition in 1 or 2.

    The coding required is very straightforward:

    #!/usr/bin/env perl use strict; use warnings; use autodie; my $filename_prefix = 'pm_1155986_out_'; my $filename_suffix = '.txt'; my $filename_number = 0; my $out_fh; my ($start_re, $end_re) = (qr{TRANSACTION STARTED}, qr{TRANSACTION END +}); open my $in_fh, '<', 'pm_1155986_in.txt'; while (<$in_fh>) { if (/$start_re/ .. /$end_re/) { if (/$start_re/) { open $out_fh, '>', $filename_prefix . $filename_number++ . $filename_suff +ix; next; } if (/$end_re/) { close $out_fh; next; } print $out_fh $_; } }

    [Note I've used the autodie pragma. This avoids having to hand-craft ... or die "..." messages for all the I/O operations: a tedious and error-prone activity (which Perl will do for you if you ask it nicely).]

    Here's all the input and output data (within the spoiler):

    — Ken

      Just FYI, FWIW, the .. operator has a couple of features that can replace the duplicated regex matching.

      First, the value of .. isn't just FALSE or TRUE, it's also a line number relative to the start of the range. Before the start, the value is 0 (aka FALSE). When the start of the range is matched, the value is 1. this number increments until the end of the range. So, you can:

      my $rln = /$start_re/ .. /$end_re/; if $rln == 1 { # open output file next; } if $rln > 1 { print $out_fh $_; }

      Second, when the range ends, the number has 'E0' appended. So, you can:

      if rindex($rln, 'E0') { close $out_fh; next; }

      rindex is a simple string search that works backwards, so has much less overhead than another regex match. And appending 'E0' to a string of digits is still a valid number - numerically equal to the number without the 'E0'.

Re: Extract ranges of lines from a file, saving each range to a separate file
by Discipulus (Canon) on Feb 24, 2016 at 08:09 UTC
    Hello perlato and welcome to the monastery and to the wonderful world of Perl!

    your definition of $outputfile is wrong: it is a directory, impossible to write there, then you are never writing to $ofh anyway. In addition where are you creating a different outfile for each transaction?

    Try to describe the correct procedure in english an then try to translate to Perl.

    Doing everything during the while loop can be tricky for your Perl level (maybe), so try to semplify the process: put your distinct part of interest into a datastructure you feel comfortable with (array? an hash? arrayOFarras?).

    Then cycle the datastructure and for each element decide a filename, open it, write to it what you need, close it and pass to next element.

    It not so hard, I think, but post the code in case of problems.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.