Here's another way to approach the problem. It is, frankly, over-engineered, but I want to try to illustrate some general ideas I have found useful. Among them are:

This script runs correctly against the example input/output files posted here (and fixed here!) (update: but see Update below about use of  /o modifier in the  s/// substitution).
File scrub_ref_1.pl:

use warnings; use strict; use autodie; use Data::Dump qw(dd); use constant TRIGGER => qr{ \b bar \b }xms; # arms a reference number MAIN: { # all lexical variables within this scope are isolated die "usage: $0 <input filename>" unless @ARGV == 1; # slurp entire file to memory. my $filename = $ARGV[0]; my $allfile = do { local $/; open my $fh_in, '<', $filename; <$fh_in>; }; # capture list of ref object numbers to delete. my @object_numbers = do { # all these substrings appear alone on a line. my $rx_blk_start = qr{ ^ begin $ }xms; # block start my $rx_ref_n = qr{ ^ \d+ $ }xms; # ref number my $rx_blk_end = qr{ ^ end $ }xms; # block end # this matches any stuff before end of block. my $rx_not_blk_end = qr{ (?! $rx_blk_end) . }xms; # return list of captures. $allfile =~ m{ $rx_blk_start \n ($rx_ref_n) \n # capture valid ref number for deletion $rx_not_blk_end* # any stuff before block end ${ \TRIGGER } # must appear at least once in block $rx_not_blk_end* # any stuff before block end $rx_blk_end }xmsg; }; # dd 'object_numbers', \@object_numbers; # FOR DEBUG # build regex of ref object numbers to delete. my ($rx_del_ref_n) = map qr{ \b (?: $_) \b }xms, join '|', reverse sort @object_numbers ; # print 'delete ref n regex', $rx_del_ref_n, "\n"; # FOR DEBUG # delete all ref n objects from text. $allfile =~ s{ ^ (?: [ ]* foo)? [ ]+ ref [ ] $rx_del_ref_n \n } {}xmsgo; # print "edited allfile [[$allfile]] \n"; # FOR DEBUG # save processed file to new file. my $out_filename = "$filename.removed"; open my $fh_out, '>', $out_filename; print $fh_out $allfile; close $fh_out; exit; } # end MAIN block die "unexpected exit from MAIN";
Output:
c:\@Work\Perl\monks\Anonymous Monk\1232492>perl scrub_ref_1.pl text.in c:\@Work\Perl\monks\Anonymous Monk\1232492>fc /b text.in.removed text. +in.removed.au Comparing files text.in.removed and TEXT.IN.REMOVED.AU FC: no differences encountered

Update: For some inexplicable reason, I used the  /o modifier with the  s/// substitution in the code above. This modifier (see Regexp Quote-Like Operators in perlop) forces the regex to be compiled once and only once during execution of the script. The substitution should "properly" be
    $allfile =~ s{ ^ (?: [ ]* foo)? [ ]+ ref [ ] $rx_del_ref_n \n }
                 {}xmsg;

As the script stands, processing only one file per invocation, the  /o modifier does no harm, but confers no benefit; the  s/// match regex is compiled and executed only once in any case. A problem arises if this code, which appears to work perfectly well, is recycled into another script that processes multiple files per invocation, a natural extension. In this case, the  $rx_del_ref_n dynamic regex compiled for the first file processed will be used for all subsequent files because the  s/// into which it is interpolated will never be re-compiled. Depending on the data being processed, this bug may be very difficult to spot!


Give a man a fish:  <%-{-{-{-<


In reply to Re: some efficiency, please (updated) by AnomalousMonk
in thread some efficiency, please by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.