@parts is a list of the 'words' in a protein name to be matched. The code builds %proteinLU as chains of nested keys. The value for each key is another hash except for _name_ keys whose value is a complete protein name.
$parent = $parent->{$part} ||= {}; sets the value of a new key to an empty hash. Using ||= in that way avoids an explicit if ! exists $parent->{$part} test.
The match code works by 'walking' down a chain of nested hash keys. Each time a new key is matched its value becomes the next 'parent'. The assignment to @best 'remembers' the last protein name that matched. @best is an array because two values need to be remembered for the match: the protein name ($parent->{_name_}) and the number of words to remove ($wIndex).
In reply to Re^4: Tag protein names in sentences
by GrandFather
in thread Tag protein names in sentences
by sinlam
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |