I'm trying to add a feature to my site that let's users see a list of items the system believes they may like based on their prior selected items. I have no mathematical background so I'm sure there are far superior ways to accomplish this with great accuracy than what I'm planning on trying. If you have a site or node that I should familiarize myself with, please guide me to it. I haven't found much to help me with ideas or algorithms for this so far.

My plan, however, is to do something like this:



For example, if we have an item called "really ugly petite blue pants", we would compare each word against the user's records (that we've pulled out and stuffed into a hash). We find that petite and pants match in the top of the user's records (petite and pants are some of that most frequently encountered words in the titles of items the user has selected in the past). Then we check to see if the category that the item is in matches any categories the user has previously selected items from. If it does, we add to its score. We do this for each item and then the top N items on the entire site are displayed on a page for the user to look at.

I think this would work. I'm not sure how well, but... it would work. My biggest fear is that this is a hell of a lot of processing to do! Especially if you're talking about users who may have hundreds, thousands or even tens of thousands of items in their history and a site with easily 5,000 items to match against. Imagine having to do the above steps 5,000 times and storing it all in a hash temporarily while you pick through what the top items are and build up the scores!

So I'm looking for improvements, alternatives... most any suggestions whatsoever.

Added <readmore> at author's request - dvergin 2003-06-27


In reply to Fuzzy matching to user's tastes? by Seumas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.