Bama_Perl has asked for the wisdom of the Perl Monks concerning the following question:

I wish to loop through multiple files, and their respective lines in the file. I have done is successfully already. Want I want to do now is remove lines in a file based on a numeric value in one of the columns. If I have an input such as this:
XP.sta1 -41.5166 0.0513 0.6842 0.1794 0 CPHI.BHZ 300 +.2458 -42.2436 XP.sta2 3.5972 0.0500 0.7699 0.1213 0 E000.BHZ 30 +0.5616 2.5545 XP.sta3 3.7112 0.0267 0.7813 0.1457 0 E002.BHZ 30 +0.6140 2.6160 XP.sta4 4.2891 0.0214 0.6870 0.1308 0 E004.BHZ 30 +1.2073 2.6006
I want to remove the line IF the 8th column numeric value is < -10 or > 10. Here's my code so far:
open(TABLEC,$mFile); @tablec = <TABLEC>; for ($j = 2; $j < $stop; $j++) { chomp ($tablec[$j]); ($netSta,$delayTime) = (split /\s+/,$tablec[$j])[1,9]; next if $delayTime < -10 or $delayTime > 10;
I am reading through multiple files, and for lines in between 2 and "stop" I want to remove that line if the 8th column value is <-10 or > 10. From the input above, I want to remove the line beginning with XP.sta1, since the 8th column is -40. How do I do this without simply deleting the 8th column value? This script is slightly modified, but if I run with the above code structure, the output is this:
XP.sta1 -41.5166 0.0513 0.6842 0.1794 0 CPHI.BHZ 30 +0.2458 2.5545 XP.sta2 3.5972 0.0500 0.7699 0.1213 0 E000.BHZ 30 +0.5616 2.6160 XP.sta3 3.7112 0.0267 0.7813 0.1457 0 E002.BHZ 30 +0.6140 2.6006 XP.sta4 4.2891 0.0214 0.6870 0.1308 0 E004.BHZ 30 +1.2073
Where in the 8th column, that value of -42.4326 is removed, but not the entire line. How do I delete the entire line, rather than just the value in column 8? Thanks for the help.

Replies are listed 'Best First'.
Re: Remove line from file based on numeric column value
by aaron_baugher (Curate) on May 25, 2015 at 21:59 UTC

    Instead of thinking in terms of "removing lines," think of it as, "print the rest of the lines, skipping certain ones." Also, the array you read the file into is not the file. You have to print the results back out. The best method is usually to read line-by-line from the input file, printing to a new file as you go, then move that new file into place if necessary. So in pseudocode:

    open inputfile for reading open outputfile for writing while get a line from inputfile split it into fields if the 8th field is >= -10 and <= 10 print the line to outputfile close the files move outputfile to inputfile

    If, for some reason, you can't print the results to a new file (perhaps you don't have enough disc space to accommodate two copies of the data at once), look into the tie function and related modules. They allow you to tie an array variable to a file, so that each line is one element of the array. Then you could splice elements you don't want out of the array, and those lines will be removed from the file. But that method is more advanced, usually unnecessary, and more dangerous -- the "write to a new file then move it" method gives you a chance to make sure the new copy is what you wanted before wiping out the old copy.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

Re: Remove line from file based on numeric column value
by johngg (Canon) on May 25, 2015 at 23:06 UTC

    I wondered whether using abs to get absolute value then making one comparison rather than two would be quicker. Apparently not.

    use strict; use warnings; use Benchmark qw{ cmpthese }; my @data = map { rand( 20 ) * ( 1, -1 )[ int rand 2 ] } 1 .. 1e6; cmpthese( -10, { abs => sub { my @arr = grep { abs $_ <= 10 } @data }, two => sub { my @arr = grep { $_ >= -10 and $_ <= 10 } @data }, } );
    Rate abs two abs 5.11/s -- -9% two 5.62/s 10% --

    Cheers,

    JohnGG

      Ha, yeah, it doesn't look like too big of a difference in terms of computation time. Interesting.
Re: Remove line from file based on numeric column value
by ww (Archbishop) on May 26, 2015 at 12:10 UTC

    Canonical answer from aaron_baugher: Think in terms of writing your original content to a new file with some elements removed.

    BUT, re your narrative: Your phrase "the 8th column" should be "the 9th column" because your removal target is column 9. Just so it's clear, column 9 is the 8th element, $dataline[8], of an array (@arr) formed by splitting

    XP.sta1    -41.5166    0.0513    0.6842    0.1794    0  CPHI.BHZ   300.2458   -42.2436

    on white space because "XP.sta1" is $line[0].

    my $dataline = "XP.sta1 -41.5166 0.0513 0.6842 0.1794 0 + CPHI.BHZ 300.2458 -42.2436"; my @arr = split /\s+/, $dataline; say "$arr[0], \t $arr[8]"; =head OUTPUT: XP.sta1, -42.2436 =cut

    UPDATE: Your pseudocode (I hope it was intended as pseudocode) is far from actually useful and makes very little sense in terms of your problem statement. Something on this order might serve you better:

    use 5.018; use Data::Dumper; my $mFile='DATA1127720.txt'; # OP's sample data open(my $tablec, "<", "$mFile") or die "Can't open the datafile: $!"; my @tablec = <$tablec>; for (@tablec) { my @XPinfo = split /\s+/, $_; chomp @XPinfo; my $delayTime = $XPinfo[8]; if ( $delayTime < -10 ) { say "\t OUT OF BOUNDS!!! \$delayTime: $delayTime"; }else{ say $delayTime; # revision to save desired lines left as an e +xercise } } =head EXECUTION D:\>perl 1127720FIXED.pl OUT OF BOUNDS!!! $delayTime: -42.2436 2.5545 2.6160 2.6006 =cut

    Your questions will benefit from careful use of language. Your "code so far" could be read as reflecting the PM admonition 'show us your code' ONLY if it accurately reflected your question ...and were it compilable.