in reply to Find substring based on words and not in charachters

Update: Added minor optimisation. Update2: Rolled the optimisation into the while loop.

Something like this?:

#! perl -slw use strict; my $seg1 = "The man who likes reading books and writing poems."; my $seg2 = "The man who likes reading big books and poems."; my $best = ''; while( length( $seg1 ) > length( $best ) ) { while( $seg1 =~ m[(?!\s)(?=(\b.+\b)(?!\s))]g ) { my $bit = $1; $best = $bit if $seg2 =~ m[\Q$bit] and length( $bit ) > length +( $best ); } $seg1 =~ s[(?:\s|^)\S+$][]; } print $best; __END__ [17:04:21.46] C:\test>junk39 The man who likes reading

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Find substring based on words and not in charachters (Updated.)
  • Download Code

Replies are listed 'Best First'.
Re^2: Find substring based on words and not in charachters (Updated.)
by choroba (Cardinal) on Dec 02, 2014 at 17:16 UTC
    I'd compare word count instead of length.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      The OP certainly could go that way, but I see two problems with it:

      1. Deciding upon a definition for a "word".
      2. Are a few short words more meaningful than 1 less long one?

        Eg. "on the way to" -v- "the Riechstag Bureau"?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      OP states he is looking for longest string; he wants to avoid getting parts of words, such as the last 'b' the lcss algorithm finds.

      1 Peter 4:10
Re^2: Find substring based on words and not in charachters (Updated.)
by Anonymous Monk on Dec 03, 2014 at 09:20 UTC

    Hi,

    Thanks a lot for your code! It works perfect :)