in reply to Re^3: clustering pairs
in thread clustering pairs

Thanks a lot, that was of a great help with the comments explaining in detail...Just thought, y not possible with arrays. Check out the program below :)
#!/usr/bin/perl open FILE,"sampledata"; @arr = <FILE>; chomp @arr; close(FILE); local $,="\n"; while(@arr) { my @reslt; @str = shift @arr; push(@reslt,$str[0]); while (@str) { $flag = 0; $str = shift @str; $s1,$s2,$flag) = split(/ /,$str); my $count = -1; my $acount = 0; #to arrange o/p foreach(@arr) { $count++; if($_ =~ /$s1|$s2/) { $acount++; if($acount == 2 || $flag == 1) { unshift(@reslt,$_); unshift(@str,$_." 1"); } else { push(@reslt,$_); push(@str,$_); } splice(@arr,$count,1); } } } print @reslt,"\n"; }

Replies are listed 'Best First'.
Re^5: clustering pairs
by GrandFather (Saint) on Dec 03, 2008 at 02:27 UTC

    Sadly there are a few issues with that code. Some are transcription errors, some are coding convention related, some are bugs and one is at odds with a PerlMonks' convention.

    First the PerlMonks' convention: runnable stand alone code. By removing the file handling and using DATA instead it's easy to make your code stand alone.

    Coding conventions: always use strictures (use strict; use warnings;). Use a consistent indentation style (Perl tends toward K&R with 4 character indents). Use the three parameter version of open and test for errors (open ... or die "... $!\n" by convention). Don't slurp (my @arr = <DATA>;). Use blank lines to break your code up into "paragraphs". Comment tricky stuff (your use of $flag and the appended 1 for example).

    Bugs: it doesn't work! I get two one row clusters then everything else in one cluster. With strictures on there are "Use of uninitialized value" warnings.


    Perl's payment curve coincides with its learning curve.
      well, i will tell you the reason. There is a small change in my input data (sorry for that, earlier),so now the ID is completely similar. It used to be similar only after the '.' before. So, i am splitting it with space now. you can try running the program with the input data given below:
      SID5141.C1665 SID5141.C2448 SID5141.C1253 SID5141.C2039 SID5141.C1596 SID5144.C1956 SID5141.C1906 SID5144.C2149 SID5142.C1221 SID5144.C1956 SID5144.C2149 SID5141.C2386 SID5141.C2039 SID5142.C1221 SID5141.C5887 SID5141.C7685 SID5141.C1005 SID5142.C2808 SID5141.C1046 SID5141.C1596 SID5141.C2386 SID5141.C4990 SID5141.C7685 SID5141.C4888
      apart from this, i have noted down your other suggestions. i am sure i will improve and take care of all what you have mentioned :)