in reply to Re^4: clustering pairs
in thread clustering pairs

Sadly there are a few issues with that code. Some are transcription errors, some are coding convention related, some are bugs and one is at odds with a PerlMonks' convention.

First the PerlMonks' convention: runnable stand alone code. By removing the file handling and using DATA instead it's easy to make your code stand alone.

Coding conventions: always use strictures (use strict; use warnings;). Use a consistent indentation style (Perl tends toward K&R with 4 character indents). Use the three parameter version of open and test for errors (open ... or die "... $!\n" by convention). Don't slurp (my @arr = <DATA>;). Use blank lines to break your code up into "paragraphs". Comment tricky stuff (your use of $flag and the appended 1 for example).

Bugs: it doesn't work! I get two one row clusters then everything else in one cluster. With strictures on there are "Use of uninitialized value" warnings.


Perl's payment curve coincides with its learning curve.

Replies are listed 'Best First'.
Re^6: clustering pairs
by sugar (Beadle) on Dec 03, 2008 at 02:56 UTC
    well, i will tell you the reason. There is a small change in my input data (sorry for that, earlier),so now the ID is completely similar. It used to be similar only after the '.' before. So, i am splitting it with space now. you can try running the program with the input data given below:
    SID5141.C1665 SID5141.C2448 SID5141.C1253 SID5141.C2039 SID5141.C1596 SID5144.C1956 SID5141.C1906 SID5144.C2149 SID5142.C1221 SID5144.C1956 SID5144.C2149 SID5141.C2386 SID5141.C2039 SID5142.C1221 SID5141.C5887 SID5141.C7685 SID5141.C1005 SID5142.C2808 SID5141.C1046 SID5141.C1596 SID5141.C2386 SID5141.C4990 SID5141.C7685 SID5141.C4888
    apart from this, i have noted down your other suggestions. i am sure i will improve and take care of all what you have mentioned :)