utpalmtbi has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to check intermediate range values from two file columns.

My two input files are like this:

(file1.txt)

a 11-23

b 33-39

c 40-45

d 48-58

& (file2.txt)

33-39

40-42

43-46

51-52

From it, I want to match the file2 values to that of file1 (2nd column) with intermediate ranges of values, I want the output to be like:

b 33-39

c 40-42, 43-45

d 45-46, 51-52

I have tried:

@ARGV or die "No input file specified"; open my $first, '<', $ARGV[0] or die "Unable to open input file: $!"; open my $second, '<', $ARGV[1] or die "Unable to open input file: $!"; print scalar(<$first>); my $secondHeader = <$second>; while (<$first>) { @cols = split /\s+/; $p1 = $cols[1]; $p2 = $cols[2]; my $secondLine = <$second>; if ( defined $secondLine ) { @sec = split( /\s+/, $secondLine ); print join( "\t", @cols ), "\n" if ( $p1>=$sec[0] && $p2<=$sec +[1] || $p1<=$sec[0] && $p2>=$sec[1] ); } }

Which gives me wrong output:

a 11-23

b 33-39

c 40-45

d 46-58

I am stuck about what argument I should give in the last line to get the proper output.. Plz help

Replies are listed 'Best First'.
Re: finding intermediate range values from two file columns
by choroba (Cardinal) on Aug 09, 2016 at 21:03 UTC
    As perldigious noted, your sample input and output don't make sense. I modified the input in the following way:
    a 11-23 b 33-39 c 40-45 d 48-58
    1-34 35-39 40-42 43-49 51-59 62-90

    And got the following result:

    a 11-23 b 33-34 b 35-39 c 40-42 c 43-45 d 48-49 d 51-58

    from the following code:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use constant { FROM => 0, TO => 1, NAME => 2, }; my @ranges; open my $F1, '<', shift or die $!; while (<$F1>) { my ($name, $range) = split; my ($from, $to) = split /-/, $range; push @ranges, [$from, $to, $name]; } my $range_idx = 0; open my $F2, '<',shift or die $!; while (<$F2>) { chomp; my ($from, $to) = split /-/; my $end; do { ++$range_idx until $range_idx > $#ranges || $ranges[$range_idx][FROM] <= $to && $ranges[$range_idx][TO] >= $from; last if $range_idx > $#ranges; my $start = $from > $ranges[$range_idx][FROM] ? $from : $ranges[$range_idx][FROM]; $end = $to > $ranges[$range_idx][TO] ? $ranges[$range_idx][TO] : $to; say $ranges[$range_idx][NAME], " $start-$end"; } while $to > ($from = 1 + $end); }

    Explanation: @ranges is an array of arrays, it stores the ranges from the first file. While processing the second file, you remember the last range(1) used ( $range_idx ) and for input ranges(2) spreading over more than one range(1), you adjust the $from and try again.

    Joining the b's and c's left as an exercise for the reader :-)

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: finding intermediate range values from two file columns
by haukex (Archbishop) on Aug 09, 2016 at 21:07 UTC

    Hi utpalmtbi,

    First, I recommend you have a look at the Basic debugging checklist, especially the first two points. For example, if you had turned on warnings, you'd see some hints which might help in figuring out what's going on: Argument "40-42" isn't numeric in numeric ge (>=) means you're trying to use a string like "40-42" as a number. Your current split regex /\s+/ only splits on whitespace. It looks to me like you expect it to split on dashes as well, but for that, you'd have to write /[\s\-]+/. Or, you can split a string like "40-42" into its components using something like my ($from,$to) = split /-/, $range, 2;

    Second, note that the code my $secondLine = <$second>; will read a line from the file each time it is called. Since you're doing this inside the loop over the lines of the first file, that means that every time you read a line from the first file via <$first>, you'll also read a line from the second file, so that you'll only ever be comparing "file 1 line 1" with "file 2 line 1", then "file 1 line 2" with "file 2 line 2", and so on.

    There was a recent thread with discussion on how to compare each line of one file with each line of another file, and some of the solutions there might help you out: Simple comparison of 2 files. The first approach is to loop over both files and compare each line with each other line; however this is inefficient and won't fare well with large files. The second approach is to load one of the two files into a data structure in memory, for example into a hash (Update: or array, as choroba demonstrated), and then loop over the lines of the other file and compare them to the data in memory. Since you're dealing with intervals, instead of a hash/array, an Interval tree might be helpful. There are modules on CPAN such as Set::IntervalTree (disclaimer: I haven't used this) that might be useful.

    Here's a skeleton (based on my post Re: Simple comparison of 2 files) using the aforementioned inefficient approach of comparing each line in the first file with each line in the second file, which might still be good enough for your purposes if one or both of the files is small. (If one file is small and the other is not, make file1 be the large one and file2 the small one.)

    use warnings; use strict; use Tie::File; tie my @file1, 'Tie::File', '/tmp/file1.txt' or die $!; tie my @file2, 'Tie::File', '/tmp/file2.txt' or die $!; for (@file1) { my ($name,$lo1,$hi1) = split /[\s\-]+/; for (@file2) { my ($lo2,$hi2) = split /-/; # your comparison logic here print "name=$name, lo1=$lo1, hi1=$hi1; lo2=$lo2, hi2=$hi2\n"; } }

    Output:

    Hope this helps,
    -- Hauke D

    Update 2: Fixed thinko in regex.

      Interval sets provide for a concise solution. Here's a demonstration using choroba's data.

      #! /usr/bin/perl -wl use Set::IntSpan; ($",$/) = (',',''); my ($f1, $f2) = map [ m/(\S+)/g ], <DATA>; while (my ($k, $v) = splice(@$f1, 0, 2)) { my @r = map Set::IntSpan->new($_)->I($v), @$f2; print "$k: @{[map $_->run_list, grep $_->size, @r]}"; } __DATA__ a 11-23 b 33-39 c 40-45 d 48-58 1-34 35-39 40-42 43-49 51-59 62-90

Re: finding intermediate range values from two file columns
by perldigious (Priest) on Aug 09, 2016 at 19:59 UTC

    Well I, and I'm sure a lot of others, do love little code puzzles like this to solve.

    I'm confused why in your desired output however the line for "d" includes "45-46" before "51-52". That's the only part of the pattern I can't wrap my brain around. Could you elaborate on that please?

    EDIT: The only way that makes sense to me is if your original "file1" input had line "d" as "45-58" instead of "48-58". That or like I said, I'm missing something.

    I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
    I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious

      Rats! You beat me to the punch (post?) again choroba and haukex. Your guys code-foo is strong.

      UPDATE: Ah, but mine does keep his specified formatting, so I'm claiming the bonus points :-).

      Well, here is my solution anyway. I'm a lot more long winded with my code than the other monks here...

      #!/usr/bin/perl use warnings; use strict; @ARGV or die "No input file specified"; open my $first, '<', $ARGV[0] or die "Unable to open input file: $!"; open my $second, '<', $ARGV[1] or die "Unable to open input file: $!"; chomp(my @first_lines = <$first>); chomp(my @second_lines = <$second>); close $first; close $second; foreach (@first_lines) { my ($line_letter, $range) = split; my ($range1_low, $range1_high) = split /-/, $range; my $output_line; foreach (@second_lines) { my ($range2_low, $range2_high) = split /-/, $_; my ($current_match, $first_match); foreach my $range2_value ($range2_low..$range2_high) { last if ($range2_value > $range1_high); next if ($range2_value < $range1_low); if (($range2_value >= $range1_low) && ($range2_value <= $r +ange1_high)) { $first_match = $range2_value if (!defined $current_mat +ch); $current_match = $range2_value; } } $output_line .= "$first_match-$current_match, " if ((defined $ +current_match) && ($current_match != $first_match)); $output_line .= "$first_match, " if ((defined $first_match) && + ($current_match == $first_match)); } if(defined $output_line) { substr($output_line, length($output_line)-2, 2) = ""; print "$line_letter $output_line\n"; } }

      I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
      I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
Re: finding intermediate range values from two file columns
by Anonymous Monk on Aug 10, 2016 at 14:19 UTC

    TIMTOWTDI, using choroba's data

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1169432 use strict; use warnings; my ($answer, $letters); while(<DATA>) { my ( $let, $start, $end) = /(?:(\w) )?(\d+)-(\d+)/; if( $let ) { $letters |= "\0" x $start . $let x ($end - $start + 1); } else { $_ = $letters & "\0" x $start . "\xff" x ($end - $start + 1); $answer .= "$1 $-[0]-" . ($+[0] - 1) . "\n" while /(\w)\1*/g; } } 1 while $answer =~ s/^(\w).*\K\n\1 /,/gm; print $answer; __END__ a 11-23 b 33-39 c 40-45 d 48-58 1-34 35-39 40-42 43-49 51-59 62-90

    produces

    a 11-23 b 33-34,35-39 c 40-42,43-45 d 48-49,51-58