infantcoder has asked for the wisdom of the Perl Monks concerning the following question:
Hi, Monks. I want to retrieve person's name from a string by name dictionary.
So, get the following way:
1. Built Trie tree (use Tree::Trie) to store person's names by name dictionary.
2. Tokenize the string by name dictionary(has frequency).
3. Query every token if exist in the Trie tree or not.
Name Dictionary(name, frequency pair):
Alex Fong => 100
Fong => 100
Ferenc Kállai => 96
Joe Smith => 95
Sándor Pécsi => 90
John Doe => 89
Sándor Tompa => 62
周杰倫 => 57
纯ちゃん => 2
... ...
Example1:
Input string: "Esther Kwan, 纯ちゃん | Alex Fong (Hong Kong) / Joe Smith ; Fong 周杰倫 Ferenc Kállai"
Output(order not important):
"Alex Fong"
"周杰倫"
"纯ちゃん"
"Joe Smith"
"Fong"
"Ferenc Kállai"
Example2:
Input string: "You know Alex Fong believe what Fong said ?"
Output(order not important):
"Alex Fong"
"Fong"
Question:
The step2: How to Tokenize a string by a custom dictionary?
Which means: the tokens list in the dictionary can NOT been split.
Are there some Perl modules or toolkits available?
Thank you, Monks.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to tokenize string by custom dictionary? (+code)
by LanX (Saint) on Nov 05, 2013 at 15:14 UTC | |
by infantcoder (Novice) on Nov 06, 2013 at 03:06 UTC | |
by LanX (Saint) on Nov 06, 2013 at 18:02 UTC | |
by infantcoder (Novice) on Nov 07, 2013 at 03:30 UTC | |
by LanX (Saint) on Nov 08, 2013 at 19:48 UTC |