in reply to Identifying fraudulent users, by comparing values in database. with a hash..?

Observation 1
Comparing username/firstname/lastname might not be essential. Look, an abuser is not going to use the same user name (or the real user name) twice, right?

Observation 2
Comparing passwords could score +++. The hacker might use the same password for different accounts.

Observation 3
Secret question/answer - could work if these were hand typed and not selected from a dropped-down list. Again if the answers from two accounts match, then there is a big chance that it's a duplicate.

Observation 4
Age/zipcode/dob etc are irrelevant. As the hacker will most certainly conceal his/her identity.

Observation 5
Remote host - is this an IP number? It might be a good idea to compare this if two accounts have similar/same secret answer, similar/same password. See if two accounts are from the same sub-net, etc.

Suggestions
Checkout the CPAN module String::Similarity to compare two similar strings.

Efficiency wise, this operation is at best a (O^2)/2. Most likely O^3 if you do additional table look up's. In otherwords its going to be process intensive. So it would be a good idea to buffer all the data before the compare. And try to avoid named hashes to store values, because they are relatively slow to look up. So if you want to speed up more, then use the pseudo-hashes instead.

And most of all, good luck!
  • Comment on Re: Identifying fraudulent users, by comparing values in database. with a hash..?

Replies are listed 'Best First'.
Re: Identifying fraudulent users, by comparing values in database. with a hash..?
by jonadab (Parson) on Sep 24, 2003 at 03:00 UTC
    Most likely O^3

    Oh, O(n^k) isn't so bad. Sure, it's a bit slow, but for reasonable values of n the process *will* finish. I presume this is only being run once per user, though n will be the number of users, which could be a bit on the high side. Still... might not be too bad. Computers are pretty fast these days.

    OTOH, the other day I wrote a script that was horrible. I knew it was brute force when I wrote it, but I only needed to run it a couple of times, and n was never going to exceed 15 or so, so I didn't worry about efficiency. For n=3 it ran fine. Took a minute, but I knew it was an inefficient implementation. So I set n to 4...

    Some of you may know where this is going. Windows Me told me I was running low on disk space, so I stopped the process and discovered that the swapfile was over 180GB. (It was a recursive algorithm...) So I analysed the algorithm, and it turned out that it was approximately O((n^2)!), and using an amount of RAM proportional to running time. Yeah, that's a factorial. Guess I have to come up with a slightly more clever algorithm.


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/