Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by GrandFather (Saint) on Jan 20, 2016 at 05:43 UTC
|
You need to keep a buffer containing the lines you may need to throw away and print lines from the buffer when you know that they won't be thrown away. This stuff is tricky to get right because you have to correctly handle special cases for start up and ending conditions. Here's one way to do it:
use strict;
use warnings;
my $kPreLines = 1;
my $kPostLines = 1;
my $skipping;
my @bufferedLines;
while (defined (my $line = <DATA>) || @bufferedLines) {
push @bufferedLines, $line if defined $line;
while (@bufferedLines && $skipping) {
shift @bufferedLines;
--$skipping;
}
print shift @bufferedLines while @bufferedLines > $kPreLines + 1;
next if !@bufferedLines;
if ($bufferedLines[-1] =~ /XXXXX/) {
$skipping = $kPostLines;
@bufferedLines = ();
next;
}
next if defined $line;
print @bufferedLines;
last;
}
__DATA__
QQQQQ
PPPPP
XXXXX is my name
YYYYY
KKKKK
UUUUU
BBBBB
CCCCCC
XXXXX is what I play
KKKKK
NNNNN
Prints:
QQQQQ
KKKKK
UUUUU
BBBBB
NNNNN
Premature optimization is the root of all job security
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I was sure it would be easier with splice but i falled in the 'modifying an array while iterating over it' pitfall..
To avoid a copied data I needed to undef some value and grep for defined values when printing the copy output.. anyway it works.
I have modified the DATA a little to be easier to follow
use strict;
use warnings;
my @lin = <DATA>;
map {
$lin[$_] and $lin[$_] =~/XXXXX/ ?
$_ == 0 ? splice @lin,0,2,undef,undef :
$_ == $#lin ? splice @lin,$_-1,2,undef,undef :
splice @lin,$_-1,3,undef,undef,undef :
1
} 0..$#lin;
print "$_" for grep {defined} @lin;
__DATA__
XXXXX
Aoooo
XXXXX is my name
Boooo
11111
22222
33333
Coooo
XXXXX is what I play
Doooo
44444
# OUTPUT
11111
22222
33333
44444
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] |
|
use strict;
use warnings;
my $file_content = do {
local $/ = undef;
<DATA>;
};
$file_content =~ s/
(?:.*\n)?
^X{5}.*\n
(?:.*\n)?//xmg;
print $file_content;
__DATA__
XXXXX
Aoooo
XXXXX is my name
Boooo
11111
22222
33333
Coooo
XXXXX is what I play
Doooo
44444
| [reply] [Watch: Dir/Any] [d/l] |
|
use strict;
use warnings;
my @unv;
my $pos;
while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng
+th $_ unless $pos}
seek DATA,$pos,0;
while (<DATA>) {print unless $.-__LINE__ ~~ @unv}
__DATA__
XXXXX
Aoooo
XXXXX is my name
Boooo
11111
22222
33333
Coooo
XXXXX is what I play
Doooo
44444
# OUTPUT
11111
22222
33333
44444
If you spot the trick please explain it to... me!
L*
UPDATE:
the trick can be avoided with a cleaner solotion assigning to $.
use strict;
use warnings;
my @unv;
my $pos;
while (<DATA>){push @unv,$.-1,$.,$.+1 if /XXXXX/;$pos=(tell DATA)-leng
+th $_ unless $pos}
seek DATA,$pos,0;
$.=0;
while (<DATA>) {print unless $. ~~ @unv}
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Less tricky than the above oneliner version:
# file used below has the same content of the previous example's DATA
+token<P>
perl -e "open $fh,$ARGV[0];while(<$fh>){push @unv,$.-1,$.,$.+1 if /XXX
+XX/}seek $fh,0,0;while(<$fh>){print unless ++$index ~~ @unv}" file
11111
22222
33333
44444
i was not able to resolve to use perl -lne because i've encountered some problem using seek ARGV even if in a BEGIN block.
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Another special case is when /xxxxx/ appears on two (or more) consecutive lines.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by johngg (Canon) on Jan 20, 2016 at 11:59 UTC
|
Here's a slightly different approach that runs a while loop checking that end of file hasn't been reached rather than reading each line directly. Inside the loop a line is pushed onto the buffer then tested for the "reject" condition. If it is a reject then the next line is also added to the buffer then the whole buffer is discarded so as to reject all three lines. If the most recent line was not a reject then the previous line is shifted off the buffer and printed.
use strict;
use warnings;
open my $inFH, q{<}, \ <<EOD or die $!;
XXXXX - additional test 1
XXXXX - additional test 1
ZZZZZ
HHHHH - additional test 1 output expected
PPPPP
XXXXX is my name 1
XXXXX is my name 2
YYYYY
KKKKK
UUUUU
BBBBB
CCCCCC
XXXXX is what I play
KKKKK
NNNNN
ZZZZZ - additional test 2b
XXXXX - additional test 2a
QQQQQ - additional test 3
RRRRR - additional test 4
GGGGG
XXXXX - additional test 5
XXXXX - additional test 5
EOD
my @buffer;
while ( not eof $inFH )
{
push @buffer, scalar <$inFH>;
if ( $buffer[ -1 ] =~ m{XXXXX} )
{
last if eof $inFH;
push @buffer, scalar <$inFH>;
push @buffer, scalar <$inFH>
while $buffer[ -1 ] =~ m{XXXXX} and not eof $inFH;
@buffer = ();
}
else
{
print shift @buffer
unless scalar @buffer == 1 and not eof $inFH;
}
}
close $inFH or die $!;
The output.
HHHHH - additional test 1 output expected
KKKKK
UUUUU
BBBBB
NNNNN
RRRRR - additional test 4
I hope this is of interest.
Update: Added the output.
Update: 2: Inserted push @buffer, scalar <$inFH> while ... line and added some paired "XXXXX" lines to address the point raised by mr_mischief. The output is unchanged.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by hdb (Monsignor) on Jan 20, 2016 at 12:56 UTC
|
use strict;
use warnings;
my $previous="";
while(<DATA>){
if(/XXXXX/){
<DATA>;
$previous="";
}else{
print $previous;
$previous=$_;
}
}
print $previous;
__DATA__
PPPPP
XXXXX is my name
YYYYY
KKKKK
UUUUU
BBBBB
CCCCCC
XXXXX is what I play
KKKKK
NNNNN
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by kcott (Archbishop) on Jan 20, 2016 at 07:31 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
my ($skip, @buf);
while (<DATA>) {
if ($skip) {
$skip = 0;
next;
}
if (/XXXXX/) {
@buf = ();
$skip = 1;
}
else {
push @buf, $_;
$skip = 0;
}
if ($. > 1 && @buf >= 2) {
print for @buf[0 .. $#buf - 1];
@buf = ($buf[-1]);
}
}
print for @buf;
__DATA__
XXXXX - additional test 1
PPPPP
XXXXX is my name
YYYYY
KKKKK
UUUUU
BBBBB
CCCCCC
XXXXX is what I play
KKKKK
NNNNN
ZZZZZ - additional test 2b
XXXXX - additional test 2a
QQQQQ - additional test 3
RRRRR - additional test 4
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by mr_mischief (Monsignor) on Jan 20, 2016 at 18:35 UTC
|
You need to figure out if you're worried about overlapping matches, and specify your problem accordingly. There are solutions which will check excluded lines to see if there are additional lines to exclude after those. There are also solutions that will exclude a set of lines and move on as if the excluded lines after (but not before of course) the matched line can't possibly match. Here's an example of the difference.:
print me
exclude me
match me
match me
exclude me
print me
vs:
print me
exclude me
match me
would match me but the above line excluded me
print me anyway because the previous line wasn't matched after exclusi
+on
print me
Here's a filter that will do either, based on a command-line argument. It also is configurable for how many additional lines to exclude (before, after, or both).:
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long ();
process( init() );
exit;
sub process {
my $opts = shift;
my @buffer = ();
my $wait = 0;
while ( <> ) {
if ( ( scalar @buffer ) > $opts->{ 'b' } ) {
print ( shift @buffer );
}
if ( $wait ) {
$wait--;
next unless $opts->{ 'nest' };
} else {
push @buffer, $_;
}
if ( /$opts->{ 'pattern' }/ ) {
@buffer = ();
$wait = $opts->{ 'a' };
}
}
print @buffer;
}
sub init {
my %options = (
'help' => 0,
'a' => 1,
'b' => 1,
'c' => 0,
'nest' => 0,
'pattern' => '',
);
Getopt::Long::Configure( 'gnu_getopt' );
Getopt::Long::GetOptions( \%options, 'help+', 'a=i', 'b=i', 'c=i',
+ 'nest+', 'pattern=s' );
if ( $options{ 'c' } ) {
$options{ 'a' } = $options{ 'b' } = $options{ 'c' };
}
if ( $options{ 'help' } ) {
warn "Usage: $0 [[--help]|[[-a <n>] [-b <n>]|[-c <n>]] [--nest
+] --pattern <s>\n\n"
. "Print the input file excluding the matched line provided by
+ the -p argument and as many lines before and after that line as spec
+ified.\n\n"
. "\t--help\t\t\tthis help message\n"
. "\t-a <number>\t\texclude <number> lines after the matched l
+ine\n"
. "\t-b <number>\t\texclude <number> lines before the matched
+line\n"
. "\t-c <number>\t\texclude <number> lines before and after th
+e matched line, overriding -a and -b in the process\n"
. "\t--nest\t\t\tmatch lines already excluded by a preceding m
+atch, and exclude the following lines accordingly\n"
. "\t--pattern <string>\texclude the line matching this patter
+n (may be a regular expression) and other lines as specified by the o
+ther options\n\n";
exit 0;
}
return \%options;
}
I made it a filter as the file handling itself is kind of secondary to the problem at hand. It's possible to add input and output arguments either positionally or via flags. My preference if I were to flesh it out a bit more would be to default to STDIN and STDOUT but allow flags to override one, the other, or both.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file. (step by step)
by Anonymous Monk on Jan 20, 2016 at 03:57 UTC
|
this is how you figure it out
so you read first line, it doesn't match
so you read second line, it doesn't match
so you read third line, it matches
so you read fourth line, it doesn't match
so you read fifth line, it doesn't match
so when should you print lines you want to print?
how many lines do you have to remember?
| [reply] [Watch: Dir/Any] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by ikegami (Patriarch) on Jan 22, 2016 at 16:35 UTC
|
my $prev = '';
my $skip_next = 0;
while (<>) {
if (/XXXXX/) {
$prev = '';
$skip_next = 1;
}
elsif ($skip_next) {
--$skip_next;
}
else {
print($prev);
$prev = $_;
}
}
print($prev);
If you want to skip more than one line:
my $skip_before = 1;
my $skip_after = 1;
my @prev;
my $skip_next = 0;
while (<>) {
if (/XXXXX/) {
@prev = ();
$skip_next = $skip_after;
}
elsif ($skip_next) {
--$skip_next;
}
else {
push @prev, $_;
print(shift(@prev)) if @prev > $skip_before;
}
}
print(@prev);
One read. One check. Handles subsequent matching lines. Handles matches on the first line. Handles matches on the last line.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: how to check for a word in a file and if found remove that line, the above line and the below line from the file.
by oiskuu (Hermit) on Jan 22, 2016 at 17:53 UTC
|
That is a variation of the grep exercise. Check this thread for further examples. (The main difference between your task and the other one is negation of printing sense. Btw, the grep -v option only negates matching sense and so grep -v -A1 -B1 can not be used for your purposes.)
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: how to check for a word in a file and if found remove previous and next -- 3 oneliner variations
by Discipulus (Canon) on Jan 26, 2016 at 11:50 UTC
|
Yes, i got fixed with this and i made unsuccesfull tries using eof..eof flip-flop to process even files..
anyway now i get two semi-obfuscated oneliners, the first, simpler, is called with a single arg (the name of file):
# $y is <C>yet</C> set for the second read, @u is <C>unwanted</C> <P>
perl -lnE "BEGIN{push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV:/XXXXX/&&!
+$y?push @u,$.-1..$.+1:$y&&!($.~~@u)?say:0" next-and-previous.txt
11111
22222
33333
44444<P>
while the latter is parametrizable one: firts arg is the regex used to match unwanted strings, the second is the number of lines to be removed after and before and then the filename:
perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift@ARGV;push @ARGV,$ARGV[0]}
+eof&&!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?s
+ay:0" XXXXX 1 next-and-previous.txt
11111
22222
33333
44444
perl -lnE "BEGIN{$r=shift;$r=qr/$r/;$n=shift;push @ARGV,$ARGV[0]}eof&&
+!$y?$y=close ARGV:/$r/&&!$y?push @u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0"
+ XXXXX 2 next-and-previous.txt
22222
I promise i'll try to no bother you anymore with this.. :=)
update the parametrized oneliner can be shortend using -s switch (that has some problem anyhow)
It is the second time i use -- too.
perl -slnE "BEGIN{$r=qr/$r/;push @ARGV,$ARGV[0]}eof&&!$y?$y=close ARGV
+:/$r&&!$y?push@u,$.-$n..$.+$n:$y&&!($.~~@u)?say:0"
-- -r=XXXXX -n=1 next-and-previous.txt
11111
22222
33333
44444
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [Watch: Dir/Any] [d/l] [select] |