Hi, Monks. I want to retrieve person's name from a string by name dictionary.
So, get the following way:
1. Built Trie tree (use Tree::Trie) to store person's names by name dictionary.
2. Tokenize the string by name dictionary(has frequency).
3. Query every token if exist in the Trie tree or not.
Name Dictionary(name, frequency pair):
Alex Fong => 100
Fong => 100
Ferenc Kállai => 96
Joe Smith => 95
Sándor Pécsi => 90
John Doe => 89
Sándor Tompa => 62
周杰倫 => 57
纯ちゃん => 2
... ...
Example1:
Input string: "Esther Kwan, 纯ちゃん | Alex Fong (Hong Kong) / Joe Smith ; Fong 周杰倫 Ferenc Kállai"
Output(order not important):
"Alex Fong"
"周杰倫"
"纯ちゃん"
"Joe Smith"
"Fong"
"Ferenc Kállai"
Example2:
Input string: "You know Alex Fong believe what Fong said ?"
Output(order not important):
"Alex Fong"
"Fong"
Question:
The step2: How to Tokenize a string by a custom dictionary?
Which means: the tokens list in the dictionary can NOT been split.
Are there some Perl modules or toolkits available?
Thank you, Monks.In reply to How to tokenize string by custom dictionary? by infantcoder
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |