Thanks a lot for your help. I will try to explain below what I am trying to do in the subroutine "all".
Purpose:
- to find ALL common word groups and return them in the reverse order of the word count, if two groups have the same word count, the group showing up earlier in the string it is preferred.
Algorithm:
- I am using the dynamic algorithm for LCSS modified to return all substrings but remove the strings that overlap (the returned strings (common) + the new strings (not selected) = original sentence)
- the matrix keeps track of occurrences - you can find here http://en.wikipedia.org/wiki/Longest_common_substring_problem an explanation of the dynamic algorithm.
- while I am completing the matrix of occurrences I record in the hash %substrings the start and end index of the detected common substrings. This will allow me later on to eliminate the overlapping substrings.
- so, the first two values of substrings are for occurrences in the first string, the last two for occurrences in the second string.
- now, it comes the foreach statement:
* map1 and map2 are mask arrays for the substrings
* if I got one substring from indexes 4-10 and then I get another substring between 2-5 (overlapping on the longer one), then I have to adjust the last one to 2-3. If it is going to become blank, I simply reject it.
* reason for "sort": I have to start from substrings with the biggest word count in order to make sure I do not consider a substring which is contained in a bigger string.
I am not sure I was very clear, but please let me know what I can detail more and explain better.
In reply to Re^2: **reopened**Re: weird subroutine behavior
by flaviusm
in thread weird subroutine behavior
by flaviusm
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |