Remove row if the absolute difference between two columns is greater than a threshold

Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Remove row if the absolute difference between two columns is greater than a threshold by TomDLux (Vicar) on Feb 15, 2011 at 17:08 UTC
`grep -v` is an acceptable Unix command line method to exclude a line, but it does not qualify as a Perl solution ... at least not as a GOOD perl solution. Besides which, that would drop only one line from the file. Imagine a worst case in which you wind up excluding every line in a 1000 line file ... You would have to copy the file, sans one line, 1000 times. What you want is to go through the file, line by line, using open(), while(), and close(), test each line, and if acceptable, copy it to the output file. That means only one copying of the file, whether you drop 0 lines or a million. You say "the absolute value of column 2 minus column 2 ( I guess you mean column 3 ) is greater than or equal to 1". Except for the absolute value bit, I would test for $col2 > $col3. But it's significantly different whether you mean abs( $col2 ) > abs( $col3 ) or whether you mean abs( $col2 - $col3 ). As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply] [d/l]
Re^2: Remove row if the absolute difference between two columns is greater than a threshold by Renyulb28 (Novice) on Feb 15, 2011 at 17:22 UTC
thank you for the reply. I do mean abs(column 2 - column 3). I would like the script to be able to either remove those rows in which that absolute value is greater than or equal to 1.	[reply]
Re: Remove row if the absolute difference between two columns is greater than a threshold by fidesachates (Monk) on Feb 15, 2011 at 17:30 UTC
The poster above has given you a very good logic flow and design for your program. I'll provide a little more on the functions you might want to use. `open(); #look up the proper syntax for using the open function to #open the file while(<FILEHANDLE>) { my $line = $_; #I always prefer to copy $_ into an actual named #variable. Personal preference. Some other monk please #correct me if there is a best practice for this. }` [download] With the variable `$line`, you will want to look at the `split()` function. This will help you separate out the columns in each line. Also take a look at `chomp` if one of the columns is at the end of the line. Once you have the columns, `abs` will help with retrieving the absolute values. Finally, if the column matches your criteria, just print the variable `$line`. Afterwards, just run your program and redirect to the textfile of your choice. Happy coding! N.B. the code I posted has not been tested and thus prone to typos.	[reply] [d/l] [select]
Re: Remove row if the absolute difference between two columns is greater than a threshold by BrowserUk (Patriarch) on Feb 15, 2011 at 18:57 UTC
This should do it. See perlrun for the details: `perl -anle"$F[1]==$F[2] and print" infile > outfile` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Remove row if the absolute difference between two columns is greater than a threshold by ack (Deacon) on Feb 15, 2011 at 17:48 UTC
Here's a short script that I think does what you're after. I didn't spend any time optimizing it; so it is just so you can see a quick and dirty strategy. It uses Perl references to create 2 dimensional matrices and the arrow notation to simplify and clarify what is going on. The subrouting, `printMatrix()`, is just for convenience so that you can better see what the 'before' and 'after' situation looks like. Read more... (1494 Bytes) The output from the little script is: Read more... (466 Bytes) Good luck; welcome to Perl. ack Albuquerque, NM	[reply] [d/l] [select]
Re: Remove row if the absolute difference between two columns is greater than a threshold by suhailck (Friar) on Feb 16, 2011 at 07:05 UTC
`perl -lane 'print if abs($F[1] - $F[2]) >= 1' infile > outfile` [download]	[reply] [d/l]
Re: Remove row if the absolute difference between two columns is greater than a threshold by locked_user sundialsvc4 (Abbot) on Feb 15, 2011 at 18:36 UTC
There is, in fact, a `grep` function. See: `perldoc perlfunc perldoc -f grep perldoc -f map` Incidentally, since lists usually contain references to the things that they “contain,” I often design filtering-routines so that they scan through the input list, selecting what they want to keep and pushing those onto an output list, which is then returned. Since we’re only moving references around, we aren’t burning up memory. And, the process is non-destructive: at the end of the day, we have the output list but the input list hasn’t actually been touched. We can now, if we choose, discard the one and keep the other, or we can keep both.