in reply to Extract ranges of lines from a file, saving each range to a separate file

G'day perlato,

Welcome to the Monastery.

It looks like you were doing fine up to the flip-flop conditional (if (/TRANSACTION STARTED/ .. /TRANSACTION END/) {...}) and then got a bit lost.

You can do all the remaining processing within that if block. Here you'll want to do one of three things:

  1. If /TRANSACTION STARTED/ is TRUE, open a new file for writing. (Don't output the line.)
  2. If /TRANSACTION END/ is TRUE, close the filehandle. (Don't output the line.)
  3. Output all lines that don't match either condition in 1 or 2.

The coding required is very straightforward:

#!/usr/bin/env perl use strict; use warnings; use autodie; my $filename_prefix = 'pm_1155986_out_'; my $filename_suffix = '.txt'; my $filename_number = 0; my $out_fh; my ($start_re, $end_re) = (qr{TRANSACTION STARTED}, qr{TRANSACTION END +}); open my $in_fh, '<', 'pm_1155986_in.txt'; while (<$in_fh>) { if (/$start_re/ .. /$end_re/) { if (/$start_re/) { open $out_fh, '>', $filename_prefix . $filename_number++ . $filename_suff +ix; next; } if (/$end_re/) { close $out_fh; next; } print $out_fh $_; } }

[Note I've used the autodie pragma. This avoids having to hand-craft ... or die "..." messages for all the I/O operations: a tedious and error-prone activity (which Perl will do for you if you ask it nicely).]

Here's all the input and output data (within the spoiler):

— Ken

Replies are listed 'Best First'.
Re^2: Extract ranges of lines from a file, saving each range to a separate file
by RonW (Parson) on Feb 27, 2016 at 00:47 UTC

    Just FYI, FWIW, the .. operator has a couple of features that can replace the duplicated regex matching.

    First, the value of .. isn't just FALSE or TRUE, it's also a line number relative to the start of the range. Before the start, the value is 0 (aka FALSE). When the start of the range is matched, the value is 1. this number increments until the end of the range. So, you can:

    my $rln = /$start_re/ .. /$end_re/; if $rln == 1 { # open output file next; } if $rln > 1 { print $out_fh $_; }

    Second, when the range ends, the number has 'E0' appended. So, you can:

    if rindex($rln, 'E0') { close $out_fh; next; }

    rindex is a simple string search that works backwards, so has much less overhead than another regex match. And appending 'E0' to a string of digits is still a valid number - numerically equal to the number without the 'E0'.