in reply to Re^3: Tag protein names in sentences
in thread Tag protein names in sentences

@parts is a list of the 'words' in a protein name to be matched. The code builds %proteinLU as chains of nested keys. The value for each key is another hash except for _name_ keys whose value is a complete protein name.

$parent = $parent->{$part} ||= {}; sets the value of a new key to an empty hash. Using ||= in that way avoids an explicit if ! exists $parent->{$part} test.

The match code works by 'walking' down a chain of nested hash keys. Each time a new key is matched its value becomes the next 'parent'. The assignment to @best 'remembers' the last protein name that matched. @best is an array because two values need to be remembered for the match: the protein name ($parent->{_name_}) and the number of words to remove ($wIndex).


True laziness is hard work

Replies are listed 'Best First'.
Re^5: Tag protein names in sentences
by sinlam (Novice) on Feb 18, 2010 at 19:34 UTC
    Hi, thanks for the explanation, I get to understand the code better. Sorry, I have another question as I want to use this data structure in future. Is it possible to use a loop to traverse the whole proteinLU hash of hash table? I spent sometimes trying to do it, but I could not successfully traverse it. Thanks for your help!

      %proteinLU is a nested structure that is best traversed using recursive code. If you can't figure out how to achieve what you need when you get to the future, come back with a specific question to get a specific answer.

      In the mean time, don't worry about it! By the time you get to writing that code you will have a little more Perl understanding and with a little luck a solution will present itself. It is good to gain understanding from an exercise like this, but avoid spending too much time solving future problems that may not ever present themselves.


      True laziness is hard work