ozosan has asked for the wisdom of the Perl Monks concerning the following question:

Im trying to write the script for my task which is to extract text between to strings on separate lines, while I have the command to run it from the command line I am not sure how to do it in order to get the results into the file, any ide how to do it ? Heres is one liner that works fine :

perl -ne "BEGIN { @ARGV = map glob, @ARGV }; print if /^start\b$/ .. /^end\b$/ " input/*

so far I have this routine which prints entire content of the files which is not what I want :(:

use strict; use warnings; my $record = ""; opendir (DIR, "C:/Users/input/") or die "$!"; my @files = readdir DIR; close DIR; splice (@files,0,2); open(MYOUTFILE, ">>output/output.txt"); foreach my $file (@files) { open (CHECKBOOK, "binput/$file")|| die "$!"; while ($record = <CHECKBOOK>) { if ($record=~ /^start\b$/ .. /^end\b$/) { print MYOUTFILE "$file;$record\n"; } } close(CHECKBOOK); } close(MYOUTFILE);

Replies are listed 'Best First'.
Re: how to extract text between 2 strings on separate lines
by NetWallah (Canon) on Nov 13, 2013 at 16:44 UTC
    A couple of other nits:

    • You do not need the "\b" in the regex - /^start$/ provides sufficient bounding
    • The flip-flop operator does not reset between files. See articles in PM and SO.

    Update:Fixed "flop" typo (Thanks, LanX)

                 When in doubt, mumble; when in trouble, delegate; when in charge, ponder. -- James H. Boren

      Thank you very much for your help indeed there was a precedence issue so $record =~ /^start\b$/ .. $record =~ /^end\b$/ fixed the issue...many thanks indeed. and as for the "\b" in the regex yes you are right its not needed, thank you.

Re: how to extract text between 2 strings on separate lines
by Eily (Monsignor) on Nov 13, 2013 at 16:27 UTC

    Look at the precedence of operators, =~ is of higher precedence than ... So $record=~ /^start\b$/ .. /^end\b$/ is interpreted as (scalar $record =~ /^start\b$/) .. (scalar /^end\b$/) (in this case the scalar keyword doesn't change the result, so you can pretend they are not there to make it clearer). So you actually have: (scalar $record =~ /^start\b$/) .. (scalar $_ =~ /^end\b$/) because match operations work on $_ by default.

    while (<CHECKBOOK>) { if (/^start\b$/ .. /^end\b$/) { print MYOUTFILE "$file;$_\n"; } }
    should work better. Or $record =~ /^start\b$/ .. $record =~ /^end\b$/ if you want to use $record, but that looks more noisy.