Hello friends,
I am converting a lexical processor I wrote in Java to Perl; text processing is supposed to be very good in Perl, and I'm using it as an opportunity to learn Perl. However, although my initial write-up produces the right output, it does it around 70 times slower than my Java implementation where I was using a home-made Trie. According to Diag::NYTProf, the hangup is in _walk_tree of Tree::Trie, which brings me to my question: what is a highly time-effective way to perform matching for words and/or phrases against a target sentence, where the match will also return/allow access to supplementary data on the matched item?
Here is the algorithm I need to implement efficiently:
my $lexicon = $csv -> parse; # words to match against, and suppleme +ntary data to go with matches foreach <tweet> { foreach <word_in_tweet> { if ($lexicon includes <word_in_tweet>) { save match.supplementary_data TO tweet.result_data; } } }
Supplementary data includes topics and sentiment values corresponding to each word/phrase in my dictionary. In the end, I need to know all the topics that match in each tweet.
Important caveat: The dictionary may include multi-word entries, so these need to be matched as well and preferred over shorter matches.
Is there a more efficient tree implementation? Is Perl's internal hash implementation likely to offer sufficiently efficient alternatives? Can you think of something I'm missing?
Thank you very much for your help!
In reply to Efficient matching with accompanying data by Endless
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |