So I do (think I) need to pass through the file twice - once to find the references I want to remove, and once to actually remove them.

I took that as a challenge ;-) This only needs a single pass by reversing both the input and output by piping it through tac, and produces your desired output:

use warnings; use strict; die "Usage: $0 INFILE\n" unless @ARGV==1; my $INFILE = shift @ARGV; open my $ofh, '|-', 'tac' or die "tac (out): $!"; open my $ifh, '-|', 'tac', $INFILE or die "tac $INFILE: $!"; my ($aminblock,$prevnum,$foundstr); my %found; while (<$ifh>) { chomp; my $out=1; if (!$aminblock) { if (/^end$/) { undef $foundstr; $aminblock=1 } elsif (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { die "ref $1 without block?" unless exists $found{$1}; $out = !$found{$1}; } else { die "unexpected outside of a block: $_" } } else { if (/^\s*(\d+)\s*$/) { $prevnum=$1 } elsif (/^begin$/) { die "block ended without number?" unless defined $prevnum; $found{$prevnum} = $foundstr; undef $prevnum; $aminblock=0; } else { undef $prevnum; if (/bar/) { $foundstr=1 } } } print {$ofh} $_, "\n" if $out; } close $ifh or die "tac $INFILE: ".($!||"\$?=$?"); close $ofh or die "tac (out): ".($!||"\$?=$?");

Although the two passes through tac might actually make that less efficient for large files. Here's a two-pass version:

use warnings; use strict; die "Usage: $0 INFILE\n" unless @ARGV==1; my $INFILE = shift @ARGV; use constant { STATE_IDLE=>0, STATE_BEGIN=>1, STATE_INBLOCK=>2 }; open my $fh, '<', $INFILE or die "$INFILE: $!"; my %found; my $state = STATE_IDLE; my $curnum; for my $pass (1..2) { while (<$fh>) { chomp; my $out = 1; if ($state==STATE_IDLE) { if (/^\s*(?:foo\s+)?ref\s+(\d+)\s*$/) { $out=!$found{$1} } elsif (/^begin$/) { $state=STATE_BEGIN } else { die "unexpected in state $state: $_" } } elsif ($state==STATE_BEGIN) { if (/^\s*(\d+)\s*$/) { $curnum=$1; $state=STATE_INBLOCK } else { die "unexpected in state $state: $_" } } elsif ($state==STATE_INBLOCK) { if (/^end$/) { $state=STATE_IDLE } elsif (/bar/) { $found{$curnum}=1 } } else { die "bad state $state" } print $_, "\n" if $pass==2 && $out; } die "unexpected state at eof: $state" unless $state==STATE_IDLE; seek $fh, 0, 0 or die "seek $INFILE: $!"; } close $fh;

Update: Note that these solutions don't remove ref N lines if they appear inside begin...end blocks; this was an assumption I made, but it's actually unclear what the desired behavior is in that case?


In reply to Re^3: some efficiency, please (updated) by haukex
in thread some efficiency, please by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.