Re: Regex question
by ikegami (Patriarch) on Jan 26, 2009 at 03:18 UTC
|
my $B = 5;
my $A = 3;
my $re = qr/somepat/;
my $skip = 0;
my @lines;
while (<>) {
if (/$re/) {
@lines = ();
$skip = $A;
}
elsif ($skip) {
--$skip;
}
else {
push @lines, $_;
print(shift(@lines)) if @lines > $B;
}
}
print for @lines;
For test data, I used
keep 1
discard B5
discard B4
discard B3
discard B2
discard B1
somepat
discard A1
discard A2
discard A3
keep 2
keep 3
discard B5
discard B4
discard B3
discard B2
discard B1
somepat
discard A1
somepat
discard A1
discard A2
discard A3
keep 4
discard B5
discard B4
discard B3
discard B2
discard B1
somepat
discard A1
discard A2
discard A3
discard B1
somepat
discard A1
discard A2
discard A3
keep 5
For output, you should get
keep 1
keep 2
keep 3
keep 4
keep 5
| [reply] [d/l] [select] |
Re: Regex question
by kyle (Abbot) on Jan 26, 2009 at 01:59 UTC
|
You could do it from the command line using a combination of grep, head, and tail. Use grep -n to find out the line number that your string is on. Then you can use the others to get all the lines you want, specified by line number.
Otherwise, I'm inclined to keep an array of the lines you might want to remove if you find a match on the next line. When you don't find the match, output the earliest line and insert the unmatching line. When you do find a match, discard the buffered lines, discard the next three lines, and then just "print while (<>)" for the rest of your life.
| [reply] [d/l] [select] |
Re: Regex question
by ww (Archbishop) on Jan 26, 2009 at 04:16 UTC
|
TIMTOWTDI (and inelegant and verbose, but I think, 'linear and explicit'). Think in terms of removing 6 lines, rather than 5 and 1.
#!C:/perl/bin
use strict;
use warnings; # /me is too bleary at this hour to eliminate the "Usel
+ess use of private variable in void context at ..." (lines 56 & 61)
if (!$ARGV[0]) {
print "Useage: perl 738855.pl String_to_search_for\n";
exit;
}
my $regex_string = $ARGV[0];
my $re = qr/$regex_string/;
my @data_array;
while (my $line = <DATA>) {
chomp($line);
push @data_array, $line;
}
my $flag = '';
my $test_line;
my @cache;
for $test_line(@data_array) {
$test_line =~ /($re)/;
if ($1) {
$flag = "Found";
push @cache, "$flag";
next;
} else {
push @cache, $test_line
}
}
# print Dumper @cache;
my $i = 0;
my $cache;
for $cache(@cache) {
if ( $cache =~ /Found/ ) {
last;
} else {
$i++;
}
}
# print "\$i: $i\n"; # $i = 6 when using the __DATA__ below
my $j = ( $i - 5 ); # "-5": 5 preceding lines (and current
+ line) to delete
for ($j; $j <= $i; ++$j ) {
$cache[$j] = '';
}
$j = $i;
for ( $j; $j <= ($i+3); ++$j ) { # "$i+3" 3 following lines to disc
+ard
$cache[$j] = '';
}
for $cache(@cache) {
if ($cache) {
print "$cache \n";
} else {
next;
}
}
__DATA__
Line 1
Line 2 should be removed
line 3 should be removed
line 4 should be removed
line 5 should be removed
line 6 should be removed
line 7 should be removed this has more than 5 preceding lines and more
+ than 3 following lines. FOO
line 8 should be removed
line 9 should be removed
line 10 should be removed
line 11
Except for the warnings noted above, output is as desired:
Line 1
line 11
hth | [reply] [d/l] [select] |
|
|
#!/usr/bin/perl
use Getopt::Long;
GetOptions(
'd=s' => \$delete,
'i=s' => \$patternmatch,
'c=s' => \$confile,
)
open (FILE, $confile);
while (<FILE>) {
$line[$.] = $_;
if (/$patternmatch/) {
delete $line[$_] for ($.-5 .. $.);
}
}
close(FILE);
open (OUTPUT, ">$confile");
print (OUTPUT @line);
close (OUTPUT);
perl ./dellines.pl -i <patternmatch> -c ./someconfile -d
I'm also trying to get the (-d) to be a stand-alone flag. What would be the best approach to this?
Thanks, Jim | [reply] [d/l] |
|
|
01 keep
02 keep <- gets discarded by your code.
03 keep <- gets discarded by your code.
04 keep <- gets discarded by your code.
05 keep <- gets discarded by your code.
06 keep <- gets discarded by your code.
07 discard B5
08 discard B4
09 discard B3
10 discard B2
11 discard B1
12 somepat
13 somepat
That's in addition to the problems with $.-5 going negative.
By the way, splice would be more efficient than delete since it wouldn't shrink the allocated buffer.
I'm also trying to get the (-d) to be a stand-alone flag. What would be the best approach to this?
'd!' => \$delete,
-c ./someconfile
That's weird. Why not just print to STDOUT and redirect output? It's way more flexible at no cost.
perl dellines.pl -i somepat -d > someconfile
The name of the option doesn't even make sense. It's an output file, not a config file as far as the tool is concerned.
| [reply] [d/l] [select] |
Re: Regex question
by davido (Cardinal) on Jan 26, 2009 at 08:28 UTC
|
The challenge to turn this into a Perl one-liner proved too enticing.
I've assumed that the pattern can match more than one time in the file, but that if one of the lines deleted following the match contains the same pattern again, it gets ignored. This may or may not be an edge condition that you need to be aware of.
My approach builds a buffer of read lines. When the buffer reaches five lines long, the first line in gets shifted off the FIFO queue and printed. If the "match" line is detected, that line along with the buffer will be discarded. Then three more lines are read but ignored. Finally, upon completion of reading the file, whatever lines are still in the buffer get printed too.
The following is a long-hand version of my script first:
use strict;
use warnings;
my @buffer;
my $re = qr/10/;
while (<DATA>) {
if ( $_ =~ $re ) {
@buffer = ();
my $count = 0;
while( <DATA> and $count++ < 2 ){};
} else {
push(@buffer, $_);
if( defined( $buffer[5] ) ) {
print shift(@buffer);
}
}
}
print @buffer;
__DATA__
Line 01
Line 02
Line 03
Line 04
Line 05
Line 06
Line 07
Line 08
Line 09
Line 10
Line 11
Line 12
Line 13
Line 14
Line 15
Line 16
Line 17
Line 18
Line 19
Line 20
To convert this to a one-liner, we'll use the -n option, which implicitly creates the outer while(){} loop. The only problem with the -n option is that the remaining items in the buffer would get dropped if we didn't come up with a plan to deal with them after the implicit while() loop. The way to do this is to define an END{} block. The END{} block gets executed just after the while() loop completes, and just before the one-liner terminates execution. The -i.bak switch is used to specify in-place editing with the creation of a backup file. And as you see by looking at the code, we're hard-wiring the match value. Why not? One liners are disposable.
Here's how it looks:
perl -ni.bak -e "if(/10/){@buf=();$cnt=0;while(<> and $cnt++<2){}}else
+{push @buf,$_;if(defined $buf[5]){print shift(@buf)}}END{print @buf}"
+ mytest.txt
Enjoy!
| [reply] [d/l] [select] |
|
|
perl -ne '}{ print $.' # count lines
My solution made into a one-liner
perl -ne 'BEGIN{$_=shift for$B,$A,$p}/$p/and@l=(),do{<>for 1..$A},next
+;push@l,$_;print shift@l if$B<@l}{print@l' 3 5 pattern test.txt
where 3 is lines before, 5 is lines after the pattern.
Sample data:
| [reply] [d/l] [select] |
|
|
That will also fail if you have two matches within "A" lines of each other
01 keep
02 discard B5
03 discard B4
04 discard B3
05 discard B2
06 discard B1
07 somepat
08 discard A1
09 somepat
10 discard A1
11 discard A2 <- Not dicarded
12 discard A3 <- Not dicarded
13 keep
By the way,
defined( $buffer[5] )
reads better as
@buffer > 5
| [reply] [d/l] [select] |
|
|
use strict;
use warnings;
my @buffer;
my $re = qr/10/;
while (<DATA>) {
if ( $_ =~ $re ) {
@buffer = ();
my $count = 0;
while( defined( my $discard = <DATA> ) and $count++ < 2 ){
if( $discard =~ $re ) { $count = 0; }
}
} else {
push(@buffer, $_);
if( @buffer > 5 ) {
print shift(@buffer);
}
}
}
print @buffer;
__DATA__
Line 01
Line 02
Line 03
Line 04
Line 05
Line 06
Line 07
Line 08
Line 09
Line 10
Line 11
Line 12
Line 13
Line 14
Line 15
Line 16
Line 17
Line 18
Line 19
Line 20
| [reply] [d/l] |
Re: Regex question
by shmem (Chancellor) on Jan 26, 2009 at 07:53 UTC
|
#!/usr/bin/perl
use Getopt::Std;
my %o;
getopt 'b:a:',\%o;
my $pat = shift;
my @last;
while(<>) {
if (/$pat/) {
@last = ();
<> for 1..$o{a};
next;
}
push @last, $_;
print shift @last if $o{b} < @last;
}
print @last;
| [reply] [d/l] |
|
|
That will fail if you have two matches within "A" lines of each other
01 keep
02 discard B5
03 discard B4
04 discard B3
05 discard B2
06 discard B1
07 somepat
08 discard A1
09 somepat
10 discard A1
11 discard A2 <- Not dicarded
12 discard A3 <- Not dicarded
13 keep
| [reply] [d/l] |
|
|
#!/usr/bin/perl
use Getopt::Std;
my %o;
getopt 'b:a:',\%o;
my $pat = shift;
my @last;
OUTER: while(<>) {
if (/$pat/) {
@last = ();
for my $i(1..$o{a}) {
$_ = <>;
redo OUTER if /$pat/;
}
next;
}
push @last, $_;
print shift @last if $o{b} < @last;
}
print @last;
| [reply] [d/l] |