Regex question

jamsda has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex question by ikegami (Patriarch) on Jan 26, 2009 at 03:18 UTC
You can do that by buffering the last "B" lines. `my $B = 5; my $A = 3; my $re = qr/somepat/; my $skip = 0; my @lines; while (<>) { if (/$re/) { @lines = (); $skip = $A; } elsif ($skip) { --$skip; } else { push @lines, $_; print(shift(@lines)) if @lines > $B; } } print for @lines;` [download] For test data, I used `keep 1 discard B5 discard B4 discard B3 discard B2 discard B1 somepat discard A1 discard A2 discard A3 keep 2 keep 3 discard B5 discard B4 discard B3 discard B2 discard B1 somepat discard A1 somepat discard A1 discard A2 discard A3 keep 4 discard B5 discard B4 discard B3 discard B2 discard B1 somepat discard A1 discard A2 discard A3 discard B1 somepat discard A1 discard A2 discard A3 keep 5` [download] For output, you should get `keep 1 keep 2 keep 3 keep 4 keep 5` [download]	[reply] [d/l] [select]
Re: Regex question by kyle (Abbot) on Jan 26, 2009 at 01:59 UTC
You could do it from the command line using a combination of `grep`, `head`, and `tail`. Use `grep -n` to find out the line number that your string is on. Then you can use the others to get all the lines you want, specified by line number. Otherwise, I'm inclined to keep an array of the lines you might want to remove if you find a match on the next line. When you don't find the match, output the earliest line and insert the unmatching line. When you do find a match, discard the buffered lines, discard the next three lines, and then just "`print while (<>)`" for the rest of your life.	[reply] [d/l] [select]
Re: Regex question by ww (Archbishop) on Jan 26, 2009 at 04:16 UTC
TIMTOWTDI (and inelegant and verbose, but I think, 'linear and explicit'). Think in terms of removing 6 lines, rather than 5 and 1. #!C:/perl/bin use strict; use warnings; # /me is too bleary at this hour to eliminate the "Usel +ess use of private variable in void context at ..." (lines 56 & 61) if (!$ARGV[0]) { print "Useage: perl 738855.pl String_to_search_for\n"; exit; } my $regex_string = $ARGV[0]; my $re = qr/$regex_string/; my @data_array; while (my $line = <DATA>) { chomp($line); push @data_array, $line; } my $flag = ''; my $test_line; my @cache; for $test_line(@data_array) { $test_line =~ /($re)/; if ($1) { $flag = "Found"; push @cache, "$flag"; next; } else { push @cache, $test_line } } # print Dumper @cache; my $i = 0; my $cache; for $cache(@cache) { if ( $cache =~ /Found/ ) { last; } else { $i++; } } # print "\$i: $i\n"; # $i = 6 when using the __DATA__ below my $j = ( $i - 5 ); # "-5": 5 preceding lines (and current + line) to delete for ($j; $j <= $i; ++$j ) { $cache[$j] = ''; } $j = $i; for ( $j; $j <= ($i+3); ++$j ) { # "$i+3" 3 following lines to disc +ard $cache[$j] = ''; } for $cache(@cache) { if ($cache) { print "$cache \n"; } else { next; } } __DATA__ Line 1 Line 2 should be removed line 3 should be removed line 4 should be removed line 5 should be removed line 6 should be removed line 7 should be removed this has more than 5 preceding lines and more + than 3 following lines. FOO line 8 should be removed line 9 should be removed line 10 should be removed line 11 [download] Except for the warnings noted above, output is as desired: `Line 1 line 11` [download] hth	[reply] [d/l] [select]
Re^2: Regex question by jamsda (Novice) on Jan 26, 2009 at 05:07 UTC
Thanks for responses guys! Here's my first whack at it `#!/usr/bin/perl use Getopt::Long; GetOptions( 'd=s' => \$delete, 'i=s' => \$patternmatch, 'c=s' => \$confile, ) open (FILE, $confile); while (<FILE>) { $line[$.] = $_; if (/$patternmatch/) { delete $line[$_] for ($.-5 .. $.); } } close(FILE); open (OUTPUT, ">$confile"); print (OUTPUT @line); close (OUTPUT);` [download] perl ./dellines.pl -i <patternmatch> -c ./someconfile -d I'm also trying to get the (-d) to be a stand-alone flag. What would be the best approach to this? Thanks, Jim	[reply] [d/l]
Re^3: Regex question by ikegami (Patriarch) on Jan 26, 2009 at 05:16 UTC
That can delete too many lines. `01 keep 02 keep <- gets discarded by your code. 03 keep <- gets discarded by your code. 04 keep <- gets discarded by your code. 05 keep <- gets discarded by your code. 06 keep <- gets discarded by your code. 07 discard B5 08 discard B4 09 discard B3 10 discard B2 11 discard B1 12 somepat 13 somepat` [download] That's in addition to the problems with `$.-5` going negative. By the way, `splice` would be more efficient than `delete` since it wouldn't shrink the allocated buffer. I'm also trying to get the (-d) to be a stand-alone flag. What would be the best approach to this? `'d!' => \$delete,` [download] `-c ./someconfile` That's weird. Why not just print to STDOUT and redirect output? It's way more flexible at no cost. `perl dellines.pl -i somepat -d > someconfile` [download] The name of the option doesn't even make sense. It's an output file, not a config file as far as the tool is concerned.	[reply] [d/l] [select]
Re: Regex question by davido (Cardinal) on Jan 26, 2009 at 08:28 UTC
The challenge to turn this into a Perl one-liner proved too enticing. I've assumed that the pattern can match more than one time in the file, but that if one of the lines deleted following the match contains the same pattern again, it gets ignored. This may or may not be an edge condition that you need to be aware of. My approach builds a buffer of read lines. When the buffer reaches five lines long, the first line in gets shifted off the FIFO queue and printed. If the "match" line is detected, that line along with the buffer will be discarded. Then three more lines are read but ignored. Finally, upon completion of reading the file, whatever lines are still in the buffer get printed too. The following is a long-hand version of my script first: `use strict; use warnings; my @buffer; my $re = qr/10/; while (<DATA>) { if ( $_ =~ $re ) { @buffer = (); my $count = 0; while( <DATA> and $count++ < 2 ){}; } else { push(@buffer, $_); if( defined( $buffer[5] ) ) { print shift(@buffer); } } } print @buffer; __DATA__ Line 01 Line 02 Line 03 Line 04 Line 05 Line 06 Line 07 Line 08 Line 09 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 Line 17 Line 18 Line 19 Line 20` [download] To convert this to a one-liner, we'll use the `-n` option, which implicitly creates the outer `while(){}` loop. The only problem with the `-n` option is that the remaining items in the buffer would get dropped if we didn't come up with a plan to deal with them after the implicit `while()` loop. The way to do this is to define an `END{}` block. The END{} block gets executed just after the `while()` loop completes, and just before the one-liner terminates execution. The `-i.bak` switch is used to specify in-place editing with the creation of a backup file. And as you see by looking at the code, we're hard-wiring the match value. Why not? One liners are disposable. Here's how it looks: `perl -ni.bak -e "if(/10/){@buf=();$cnt=0;while(<> and $cnt++<2){}}else +{push @buf,$_;if(defined $buf[5]){print shift(@buf)}}END{print @buf}" + mytest.txt` [download] Enjoy! Dave	[reply] [d/l] [select]
Re^2: Regex question by shmem (Chancellor) on Jan 26, 2009 at 10:32 UTC
The way to do this is to define an END{} block. With the -n or -p switch, there's another solution - Abigail's trick, the 'eskimo greeting': `perl -ne '}{ print $.' # count lines` [download] My solution made into a one-liner `perl -ne 'BEGIN{$_=shift for$B,$A,$p}/$p/and@l=(),do{<>for 1..$A},next +;push@l,$_;print shift@l if$B<@l}{print@l' 3 5 pattern test.txt` [download] where 3 is lines before, 5 is lines after the pattern. Run through B::Deparse `sub BEGIN { ($_ = shift(@ARGV)) foreach (($B, $A, $p)); } LINE: while (defined(($_ = <ARGV>))) { (/pat/ and ((@l = ()), do { <ARGV> foreach (1 .. $A) }, next)); push(@l, $_); (($B < @l) and print(shift(@l))); } { print(@l); }` [download] Sample data: Read more... (1057 Bytes)	[reply] [d/l] [select]
Re^2: Regex question by ikegami (Patriarch) on Jan 26, 2009 at 13:38 UTC
That will also fail if you have two matches within "A" lines of each other `01 keep 02 discard B5 03 discard B4 04 discard B3 05 discard B2 06 discard B1 07 somepat 08 discard A1 09 somepat 10 discard A1 11 discard A2 <- Not dicarded 12 discard A3 <- Not dicarded 13 keep` [download] By the way, `defined( $buffer[5] )` reads better as `@buffer > 5`	[reply] [d/l] [select]
Re^3: Regex question by davido (Cardinal) on Jan 27, 2009 at 04:49 UTC
Yes, I mentioned that in my writeup. But the fix is pretty straightforward, and is presented in the code below: `use strict; use warnings; my @buffer; my $re = qr/10/; while (<DATA>) { if ( $_ =~ $re ) { @buffer = (); my $count = 0; while( defined( my $discard = <DATA> ) and $count++ < 2 ){ if( $discard =~ $re ) { $count = 0; } } } else { push(@buffer, $_); if( @buffer > 5 ) { print shift(@buffer); } } } print @buffer; __DATA__ Line 01 Line 02 Line 03 Line 04 Line 05 Line 06 Line 07 Line 08 Line 09 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 Line 17 Line 18 Line 19 Line 20` [download] Dave	[reply] [d/l]
Re: Regex question by shmem (Chancellor) on Jan 26, 2009 at 07:53 UTC
Keep an array of the last lines read: `#!/usr/bin/perl use Getopt::Std; my %o; getopt 'b:a:',\%o; my $pat = shift; my @last; while(<>) { if (/$pat/) { @last = (); <> for 1..$o{a}; next; } push @last, $_; print shift @last if $o{b} < @last; } print @last;` [download]	[reply] [d/l]
Re^2: Regex question by ikegami (Patriarch) on Jan 26, 2009 at 13:15 UTC
That will fail if you have two matches within "A" lines of each other `01 keep 02 discard B5 03 discard B4 04 discard B3 05 discard B2 06 discard B1 07 somepat 08 discard A1 09 somepat 10 discard A1 11 discard A2 <- Not dicarded 12 discard A3 <- Not dicarded 13 keep` [download]	[reply] [d/l]
Re^3: Regex question by shmem (Chancellor) on Jan 26, 2009 at 14:34 UTC
Okaay... that will do: `#!/usr/bin/perl use Getopt::Std; my %o; getopt 'b:a:',\%o; my $pat = shift; my @last; OUTER: while(<>) { if (/$pat/) { @last = (); for my $i(1..$o{a}) { $_ = <>; redo OUTER if /$pat/; } next; } push @last, $_; print shift @last if $o{b} < @last; } print @last;` [download]	[reply] [d/l]