I see. Your file is just a list of services rendered and you must "cluster" these into different categories. It is possible to do so, but it will take some work.

Do you have some kind of "dictionary" which tells you into which category or categories each type of service belongs? If so, then you just have to read each service and check it against the dictionary to find out into which category or categories each service belongs. Once you have done that, you check the number and type of categories for each client and put that info in some kind of "scoring" formula to find the most appropriate category.

If you do not have a "services-to-categories" dictionary then things become much more difficult and I really do not have a good and simple solution. I once applied Bayesian statistics on a similar problem (though only with a few broad categories to put the records in) and it worked "somewhat". I got about 80% correct categorizations (and thus 20% totally wrong), but it was enough for my purpose. If I trained the algorithm a bit more I might have gotten better results. Modules such as Algorithm::NaiveBayes or AI::Categorizer::Learner::NaiveBayes are worth taking a look at.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

In reply to Re^3: segmentation and grouping by CountZero
in thread segmentation and grouping by vkkan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.