in reply to Repeats exclusion

Use a hash of array as below,

perl -T my %hash;while(<DATA>){ my($cord,$dist)=split; push @{$hash{$cord}},$dist;} print join "\n",map { $_."\t\t".(sort {$a <=> $b} @{$hash{$_}})[0] } sort {$a <=> $b}keys %hash __DATA__ 567 344 1345 567 2346 78 3456 67 3456 789 4678 45 5349 6 6700 124 6700 50 8964 560 567 344 1345 567 2346 78 3456 67 4678 45 5349 6 6700 50 8964 560

Replies are listed 'Best First'.
Re^2: Repeats exclusion
by Grig (Novice) on Sep 12, 2010 at 16:07 UTC

    Thank you very much!

    And how this script could be changed if it is nesessry to include additional variables to the hash.

    For example, chromosome number, number of certain elements and so on? Let it be variables $chr, $exons $palindromes. At the same moment those lines should be excluded that contain repetative coord value according to the distance value (the lines with the smalest distance value should remain), as your script already does.

    coord dist chr exons palindromes 567 344 5 7 8 1345 567 5 8 123 2346 78 12 1 567 3456 67 10 1 5 3456 789 10 3 6 4678 45 6 2 0 5349 6 8 2 14 6700 124 13 8 56 6700 50 13 1 4 8964 560 2 18 8
    So the output will be something like this:
    coord dist chr exons palindromes 567 344 5 7 8 1345 567 5 8 123 2346 78 12 1 567 3456 67 10 1 5 4678 45 6 2 0 5349 6 8 2 14 6700 50 13 1 4 8964 560 2 18 8

    Thank you once more!

      You know, first asking "how do I do X", and then after getting the answer coming back with "yeah, but I didn't really want X, I want Y, Z and W" doesn't win you many friends.

      Next time, ask what you want in the first place. Then you may get useful answers. Now you've just wasted someones time, and still not have a solution.

      The code is modified for your new requirement

      use strict; my %hash;while(<DATA>){ my($cord,@others)=split; push @{$hash{$cord}},\@others;} print map {my $__=$_; map {$__," @{$hash{$__}->[$_->[0]]}","\n"} (sort {$a->[1] <=> $b->[1]} map{[$_,${$hash{$_}}->[$_]->[0]]} 0 .. $#{$hash{$__}})[0]} sort {$a <=> $b}keys %hash __DATA__ 567 344 5 7 8 1345 567 5 8 123 2346 78 12 1 567 3456 67 10 1 5 3456 789 10 3 6 4678 45 6 2 0 5349 6 8 2 14 6700 124 13 8 56 6700 50 13 1 4 8964 560 2 18 8