BhariD has asked for the wisdom of the Perl Monks concerning the following question:

I have the following data file

string length start end count ACNGYNDHNG 10 1333 1343 1152 AVDVHVHNGG 10 209 219 2916 ACNGYNDHNGARRT 14 1333 1347 4608 GNDNNVNNRHNNNNVMNNVNNT 22 1589 1611 6291456

I want to pair them based on the values in the second (start) and third column (end). The pairing is true if the difference between end of one string and start value of the other string is greater than 300. if not no pairing occurs.

so for above data, this is what I am trying to get to:

Pair1: AVDVHVHNGG 10 209 219 2916 ACNGYNDHNG 10 1333 1343 1152 Distance: 1114 Pair2: AVDVHVHNGG 10 209 219 2916 ACNGYNDHNGARRT 14 1333 1347 4608 Distance: 1114 Pair3: AVDVHVHNGG 10 209 219 2916 GNDNNVNNRHNNNNVMNNVNNT 22 1589 1611 6291456 Distance: 1370

can anyone please help me start on this

Thanks!!

Replies are listed 'Best First'.
Re: how to pair strings based on positional differences
by almut (Canon) on Mar 26, 2010 at 20:02 UTC
    The pairing is true if the difference between end of one string and start value of the other string is greater than 300.

    Just turn your words into code :)

    #!/usr/bin/perl use strict; use warnings; my @data; while (<DATA>) { push @data, [ $_, (split ' ')[2,3] ]; # [ input-line, start, end +] } my $cnt; for my $d1 (@data) { for my $d2 (@data) { my $dist = $d1->[1] - $d2->[2]; # start of one - end of the + other if ($dist > 300) { print "Pair", ++$cnt, ":\n"; print " ", $d2->[0]; print " ", $d1->[0]; print "Distance: $dist\n"; } } } __DATA__ ACNGYNDHNG 10 1333 1343 1152 AVDVHVHNGG 10 209 219 2916 ACNGYNDHNGARRT 14 1333 1347 4608 GNDNNVNNRHNNNNVMNNVNNT 22 1589 1611 6291456

    Output:

    Pair1: AVDVHVHNGG 10 209 219 2916 ACNGYNDHNG 10 1333 1343 1152 Distance: 1114 Pair2: AVDVHVHNGG 10 209 219 2916 ACNGYNDHNGARRT 14 1333 1347 4608 Distance: 1114 Pair3: AVDVHVHNGG 10 209 219 2916 GNDNNVNNRHNNNNVMNNVNNT 22 1589 1611 6291456 Distance: 1370
Re: how to pair strings based on positional differences
by keszler (Priest) on Mar 26, 2010 at 20:12 UTC

    One approach would be to open the data file, then use a while loop to read it in and split (on whitespace) to obtain the values. Store those values in an AoA (see perllol). Once the data has been read in, close the datafile.

    Next, use another while loop to shift a set of values from the AoA, then grep the remainder of the AoA with your distance calculation. Save and/or print results. The loop terminates once the AoA is empty.