Here's a fairly simply approach:

sub score{ my($str, $array) = @_; # ensure that longer string comes before its prefix my @substring = sort { $b cmp $a } @$array; my $re = join '|', map "(?=($_))", @substring; my($count, $next) = (0, 0); while ($str =~ /$re/g) { # $-[0] is the position at which we matched # @- describes the matched captures, so $#- is the actual capture +matched my($start, $which) = ($-[0], $#- - 1); my $end = $start + length($substring[$which]); $next = $start if $next < $start; next if $end < $next; $count += $end - $next; $next = $end; } return $count; }

This assumes that you may want substrings of varying lengths within the array, and that you may even have one substring in a set that is an exact prefix of another - if you don't need to allow for one or both of those possibilities, the code could be simplified a bit further.

It does assume however that the substrings are simple strings to match directly rather than regexps in their own right: it would otherwise need a different approach to discovering the length of each match.

The idea is to construct a regexp that will match any of the strings at any position by turning each into a lookahead; and to make each substring a capture so that we know which matched, and can therefore work out the length of the match.

The rest of the code remembers what positions have already been catered for to avoid double-counting.

Hugo


In reply to Re: Substring Distance Problem by hv
in thread Measuring Substrings Problem by monkfan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.