in reply to Regex question

The challenge to turn this into a Perl one-liner proved too enticing.

I've assumed that the pattern can match more than one time in the file, but that if one of the lines deleted following the match contains the same pattern again, it gets ignored. This may or may not be an edge condition that you need to be aware of.

My approach builds a buffer of read lines. When the buffer reaches five lines long, the first line in gets shifted off the FIFO queue and printed. If the "match" line is detected, that line along with the buffer will be discarded. Then three more lines are read but ignored. Finally, upon completion of reading the file, whatever lines are still in the buffer get printed too.

The following is a long-hand version of my script first:

use strict; use warnings; my @buffer; my $re = qr/10/; while (<DATA>) { if ( $_ =~ $re ) { @buffer = (); my $count = 0; while( <DATA> and $count++ < 2 ){}; } else { push(@buffer, $_); if( defined( $buffer[5] ) ) { print shift(@buffer); } } } print @buffer; __DATA__ Line 01 Line 02 Line 03 Line 04 Line 05 Line 06 Line 07 Line 08 Line 09 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 Line 17 Line 18 Line 19 Line 20

To convert this to a one-liner, we'll use the -n option, which implicitly creates the outer while(){} loop. The only problem with the -n option is that the remaining items in the buffer would get dropped if we didn't come up with a plan to deal with them after the implicit while() loop. The way to do this is to define an END{} block. The END{} block gets executed just after the while() loop completes, and just before the one-liner terminates execution. The -i.bak switch is used to specify in-place editing with the creation of a backup file. And as you see by looking at the code, we're hard-wiring the match value. Why not? One liners are disposable.

Here's how it looks:

perl -ni.bak -e "if(/10/){@buf=();$cnt=0;while(<> and $cnt++<2){}}else +{push @buf,$_;if(defined $buf[5]){print shift(@buf)}}END{print @buf}" + mytest.txt

Enjoy!


Dave

Replies are listed 'Best First'.
Re^2: Regex question
by shmem (Chancellor) on Jan 26, 2009 at 10:32 UTC
    The way to do this is to define an END{} block.

    With the -n or -p switch, there's another solution - Abigail's trick, the 'eskimo greeting':

    perl -ne '}{ print $.' # count lines

    My solution made into a one-liner

    perl -ne 'BEGIN{$_=shift for$B,$A,$p}/$p/and@l=(),do{<>for 1..$A},next +;push@l,$_;print shift@l if$B<@l}{print@l' 3 5 pattern test.txt

    where 3 is lines before, 5 is lines after the pattern.

    Sample data:
Re^2: Regex question
by ikegami (Patriarch) on Jan 26, 2009 at 13:38 UTC
    That will also fail if you have two matches within "A" lines of each other
    01 keep 02 discard B5 03 discard B4 04 discard B3 05 discard B2 06 discard B1 07 somepat 08 discard A1 09 somepat 10 discard A1 11 discard A2 <- Not dicarded 12 discard A3 <- Not dicarded 13 keep

    By the way,
    defined( $buffer[5] )
    reads better as
    @buffer > 5

      Yes, I mentioned that in my writeup. But the fix is pretty straightforward, and is presented in the code below:

      use strict; use warnings; my @buffer; my $re = qr/10/; while (<DATA>) { if ( $_ =~ $re ) { @buffer = (); my $count = 0; while( defined( my $discard = <DATA> ) and $count++ < 2 ){ if( $discard =~ $re ) { $count = 0; } } } else { push(@buffer, $_); if( @buffer > 5 ) { print shift(@buffer); } } } print @buffer; __DATA__ Line 01 Line 02 Line 03 Line 04 Line 05 Line 06 Line 07 Line 08 Line 09 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 Line 17 Line 18 Line 19 Line 20

      Dave