Thanks a lot for your help. I will try to explain below what I am trying to do in the subroutine "all".

Purpose:
- to find ALL common word groups and return them in the reverse order of the word count, if two groups have the same word count, the group showing up earlier in the string it is preferred.

Algorithm:
- I am using the dynamic algorithm for LCSS modified to return all substrings but remove the strings that overlap (the returned strings (common) + the new strings (not selected) = original sentence)
- the matrix keeps track of occurrences - you can find here http://en.wikipedia.org/wiki/Longest_common_substring_problem an explanation of the dynamic algorithm.
- while I am completing the matrix of occurrences I record in the hash %substrings the start and end index of the detected common substrings. This will allow me later on to eliminate the overlapping substrings.
- so, the first two values of substrings are for occurrences in the first string, the last two for occurrences in the second string.
- now, it comes the foreach statement:
* map1 and map2 are mask arrays for the substrings
* if I got one substring from indexes 4-10 and then I get another substring between 2-5 (overlapping on the longer one), then I have to adjust the last one to 2-3. If it is going to become blank, I simply reject it.
* reason for "sort": I have to start from substrings with the biggest word count in order to make sure I do not consider a substring which is contained in a bigger string.

I am not sure I was very clear, but please let me know what I can detail more and explain better.


In reply to Re^2: **reopened**Re: weird subroutine behavior by flaviusm
in thread weird subroutine behavior by flaviusm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.