Braindead_One has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I'm currently writing many online-gaming related scripts and thought it would be nice to have a routine that extracts clantags out of an array of playernames.

Clantags normally consist of 2 or more characters which are (mostly) at the beginning or end of a name but could also be somehere in the middle.

Normally all players of one team should have the tag in their names so i basically have to find them by seeking and extracting the similaritys.

My problem is: i have no real idea how to do that ;)
I already searched CPAN and found modules like String::Similarity and Algorithm::Diff but none of them seem to help me with my problem since i have to match more than 2 strings in order to find the right tag (Some players might be wearing no/ a different tag).

I already thought of splitting the string into substrings (with String::Substrings) and comparing the resulting arrays but that seems to be the solution with the most overhead.

I hope one of you can point me to a more efficient solution.

Some example player names could be:
jP|Azrael
jP|Blade
jP|Henry
(Clantag: jP|)

Jeff.ocr
pr!me.ocr
Lokren.ocr
(Clantag: .ocr)

Thanks in advance,
Braindead_One

Replies are listed 'Best First'.
Re: String similarity extraction
by Hofmator (Curate) on Jan 11, 2003 at 17:11 UTC

    The following code should do what you want, provided that the tags always start at the same letter (from the beginning or end - see the DATA section for what I mean by that).

    Instead of just creating the simple pairs you might be able to shift the player names characterwise against each other. Then you might get to a solution that doesn't contain the above stated restriction.

    use strict; use warnings; use List::Util qw/reduce/; sub _extract { my @names = @_; my @pairs; # create all xor'd pairs for(my $i=0; $i<@names; $i++) { for (my $j=$i+1; $j<@names; $j++) { push @pairs, $names[$i] ^ $names[$j]; } } no warnings 'once'; my $or = reduce { $a | $b } @pairs; if ( $or =~ /\0+/ ) { my $index = $-[0]; my $length = $+[0] - $-[0]; return substr $names[0], $index, $length; } else { # match not successful, so return undef return; } } sub extract_tag { my @names = @_; my $tag; $tag = _extract(@names); # try matching with reversed @names # this finds common substrs at the end $tag = reverse _extract(map {scalar reverse $_} @names) unless defined $tag; return $tag; } my @names; local $, = ':'; local $\ = "\n"; while (<DATA>) { chomp; print(extract_tag(@names)), @names = (), next unless /\S/; push @names, $_; } print extract_tag(@names); __DATA__ jP|Azrael jP|Blade jP|Henry Jeff.ocr pr!me.ocr Lokren.ocr woRUTtan hiRUTfango biRUTff salTAGo blasTAGi RipTAGu

    -- Hofmator

      Thank you very much!
      This is exactly what i was searching for :)
      I think this will catch at least 95% of the common tags.

      But i am still astonished that there is no module for this problem.
Re: String similarity extraction
by Zaxo (Archbishop) on Jan 11, 2003 at 16:49 UTC

    The examples you give use a non-word character as a delimeter. Is that always the case? If so,

    my @name_parts = split /(\W)/, $name; my $clantag = exists $clanhash{$nameparts[0]} ? $nameparts[0] . $nameparts[1] : $nameparts[1] . $nameparts[2];
    You may want to make a hash of the clantags in any case. You could then use index to find them in names.

    After Compline,
    Zaxo

      Sadly it isn't that easy.
      You'd be suprised how creative people can be if it gets to inventing a clantag ;)
      But nevertheless: Thank you
Re: String similarity extraction
by theorbtwo (Prior) on Jan 11, 2003 at 23:05 UTC

    Careful; what you want to do isn't possible to do 100% huristicly. Consider a clan which has two members, SC|Adam, and SC|Alex. The clantag isn't SC|A, even though that's teh longest common substring, it's SC|, and they both happen to have names beginning with an A.

    Also, consider SCSuperBot, SCMegaBot, and SCWackoBot. The clantag is SC, not Bot, or both, but the longest common substring is Bot.

    Anyway, don't let this discourage you too much, I just wanted to note it.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: String similarity extraction
by PodMaster (Abbot) on Jan 11, 2003 at 16:13 UTC
    Please control your interface, restrict tags to the beginning of or the end of a name, otherwise live witht the overhead (and if you're concerned about overhead, cache your results)


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      The problem is that i have no control over the interface. The data is generated by players on online gameservers. The names are chosen by the clans and all i can do is reading them from logfiles or via udp.