in reply to finding intermediate range values from two file columns

Hi utpalmtbi,

First, I recommend you have a look at the Basic debugging checklist, especially the first two points. For example, if you had turned on warnings, you'd see some hints which might help in figuring out what's going on: Argument "40-42" isn't numeric in numeric ge (>=) means you're trying to use a string like "40-42" as a number. Your current split regex /\s+/ only splits on whitespace. It looks to me like you expect it to split on dashes as well, but for that, you'd have to write /[\s\-]+/. Or, you can split a string like "40-42" into its components using something like my ($from,$to) = split /-/, $range, 2;

Second, note that the code my $secondLine = <$second>; will read a line from the file each time it is called. Since you're doing this inside the loop over the lines of the first file, that means that every time you read a line from the first file via <$first>, you'll also read a line from the second file, so that you'll only ever be comparing "file 1 line 1" with "file 2 line 1", then "file 1 line 2" with "file 2 line 2", and so on.

There was a recent thread with discussion on how to compare each line of one file with each line of another file, and some of the solutions there might help you out: Simple comparison of 2 files. The first approach is to loop over both files and compare each line with each other line; however this is inefficient and won't fare well with large files. The second approach is to load one of the two files into a data structure in memory, for example into a hash (Update: or array, as choroba demonstrated), and then loop over the lines of the other file and compare them to the data in memory. Since you're dealing with intervals, instead of a hash/array, an Interval tree might be helpful. There are modules on CPAN such as Set::IntervalTree (disclaimer: I haven't used this) that might be useful.

Here's a skeleton (based on my post Re: Simple comparison of 2 files) using the aforementioned inefficient approach of comparing each line in the first file with each line in the second file, which might still be good enough for your purposes if one or both of the files is small. (If one file is small and the other is not, make file1 be the large one and file2 the small one.)

use warnings; use strict; use Tie::File; tie my @file1, 'Tie::File', '/tmp/file1.txt' or die $!; tie my @file2, 'Tie::File', '/tmp/file2.txt' or die $!; for (@file1) { my ($name,$lo1,$hi1) = split /[\s\-]+/; for (@file2) { my ($lo2,$hi2) = split /-/; # your comparison logic here print "name=$name, lo1=$lo1, hi1=$hi1; lo2=$lo2, hi2=$hi2\n"; } }

Output:

name=a, lo1=11, hi1=23; lo2=33, hi2=39 name=a, lo1=11, hi1=23; lo2=40, hi2=42 name=a, lo1=11, hi1=23; lo2=43, hi2=46 name=a, lo1=11, hi1=23; lo2=51, hi2=52 name=b, lo1=33, hi1=39; lo2=33, hi2=39 name=b, lo1=33, hi1=39; lo2=40, hi2=42 name=b, lo1=33, hi1=39; lo2=43, hi2=46 name=b, lo1=33, hi1=39; lo2=51, hi2=52 name=c, lo1=40, hi1=45; lo2=33, hi2=39 name=c, lo1=40, hi1=45; lo2=40, hi2=42 name=c, lo1=40, hi1=45; lo2=43, hi2=46 name=c, lo1=40, hi1=45; lo2=51, hi2=52 name=d, lo1=48, hi1=58; lo2=33, hi2=39 name=d, lo1=48, hi1=58; lo2=40, hi2=42 name=d, lo1=48, hi1=58; lo2=43, hi2=46 name=d, lo1=48, hi1=58; lo2=51, hi2=52

Hope this helps,
-- Hauke D

Update 2: Fixed thinko in regex.

Replies are listed 'Best First'.
Re^2: finding intermediate range values from two file columns
by Anonymous Monk on Aug 09, 2016 at 23:45 UTC

    Interval sets provide for a concise solution. Here's a demonstration using choroba's data.

    #! /usr/bin/perl -wl use Set::IntSpan; ($",$/) = (',',''); my ($f1, $f2) = map [ m/(\S+)/g ], <DATA>; while (my ($k, $v) = splice(@$f1, 0, 2)) { my @r = map Set::IntSpan->new($_)->I($v), @$f2; print "$k: @{[map $_->run_list, grep $_->size, @r]}"; } __DATA__ a 11-23 b 33-39 c 40-45 d 48-58 1-34 35-39 40-42 43-49 51-59 62-90