OT: predicting related content

gwhite has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of 4k-5k items, I have a short survey (5-6) questions and a lot of users of the survey.

I am trying to get my brain wrapped around the best way to show the items most likely to get a response from the user based on the results of the survey. For instance a 21 year old Male that is active is likely to be interested in workout equipment, but a 51 year old Male that is active is likely to be interested in Ibuprofen.

Right now, I can only come up with brute force testing, and after enough iterations declaring a winner, for each response set. But would it be better to do a pass based on Male first and get a subset then to test against Male+Age, or should I test all the individual response possibilities, then start testing those first round winners with multiple responses?

How do updates to the item list get tested out?

The buzz words multi-variate and predictive modeling get thrown around in discussions on this, searching CPAN did not generate any module results based on those terms that do what I want(or they were so far over my head I didn't get that they are doing what I want), is there a better term for what I am trying to do?

g_White

Comment on OT: predicting related content

Replies are listed 'Best First'.

Re: OT: predicting related content
by roboticus (Chancellor) on Jan 03, 2014 at 16:59 UTC

gwhite:

I'd start simple by having a table of "score modifiers" for questions based on demographic group type. Then for a respondent, sort the questions by score and pick your questions from near the front of the list.

create table questions (
   QID number constraint questions_pk primary key,
   question varchar2(1000)
);

insert into questions (qid, question)
values (1, 'Duz ya likes teh exersize equips?');
insert into questions (qid, question)
values (2, 'Would you like fast relief from muscle aches?');

create table group_attributes (
   GAID number constraint groups_attributes_pk primary key,
   gr_attr varchar2(30)
);

insert into group_attributes (gaid, gr_attr) values (1, 'Male');
insert into group_attributes (gaid, gr_attr) values (2, 'Female');
insert into group_attributes (gaid, gr_attr) values (3, 'Under 15');
insert into group_attributes (gaid, gr_attr) values (4, '16-20');
insert into group_attributes (gaid, gr_attr) values (5, '20-35');
insert into group_attributes (gaid, gr_attr) values (6, '35-65');

create table question_bonuses (
   QID number constraint question_bonuses_FK1
              references questions(QID),
   GAID number constraint question_bonuses_FK2
               references group_attributes(GAID),
   bonus number
);
create index question_bonuses_pk on question_bonuses(QID,GAID);

-- people from 16-35 are more interested in exercise equip
insert into question_bonuses (qid, gaid, bonus)
values (1, 4, .2);
insert into question_bonuses (qid, gaid, bonus)
values (1, 5, .2);

-- A slight preference for males?
insert into question_bonuses (qid, gaid, bonus)
values (1, 1, .05);

-- Older people are much less interested...
insert into question_bonuses (qid, gaid, bonus)
values (1, 6, -.5);

-- But pain relief is more welcome in this case...
insert into question_bonuses (qid, gaid, bonus)
values (2, 6, .3);
[download]

Then when you have a respondent, add all the question bonuses and sort them by score and you can choose from the better questions.

select qid, question, (
       select sum(bonus) 
          from question_bonuses qb
          join group_attributes ga on ga.gaid=qb.gaid
          where qb.qid = q.qid
          and ga.ga_attr in (...respondent attribs...)
       ) as score
from questions q
order by score descending
[download]

...roboticus

When your only tool is a hammer, all problems look like your thumb.

[reply]
[d/l]
[select]


There's more than one way to do things
	PerlMonks