Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to build an lcp array - get the number of matching words of two strings from the beggining of each
example:
This is a sentence
This is another sentence.

Match is "this is" and number 2 should be returned.
Thanks for helping out

Replies are listed 'Best First'.
Re: Number of matching words
by BrowserUk (Patriarch) on Aug 14, 2004 at 16:16 UTC
    #! perl -slw use strict; sub nWordsMatch { my( $str1, $str2 ) = @_; my $p = 0; $p++ while substr( $str1, $p, 1 ) eq substr( $str2, $p, 1 ); my @words = substr( $str1, 0, $p-1 ) =~ m[\b(\w+)\b\W+]g; return @words; } my @matchWords = nWordsMatch( 'This is a sentence', 'This is another sentence.' ); print "@matchWords"; my $count = @matchWords; print $count; @matchWords = nWordsMatch( 'The quick brown fox jumps over the lazy dog', 'The quick brown fox jumps over teh lazy dog', ); print "@matchWords"; print scalar @matchWords; __END__ c:\Perl\test>test2 This is 2 The quick brown fox jumps over 6

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Number of matching words
by Limbic~Region (Chancellor) on Aug 14, 2004 at 16:06 UTC
    Anonymous Monk,
    I wrote a tutorial on this topic. This looks like a job for grep
    my @sentences = ( 'This is a sentence', 'This is another sentence', ); my $matches = grep /\bthis is\b/i , @sentences;
    Ok, after re-reading your question several times I am not sure this answers your question. Are you asking for:
    • The number of words that are the same in each sentence
    • How many words are the same starting at the beginning of each sentence
    • How many sentences does the phrase "This is" appear in (my first solution)
    • Something else entirely
    Assuming you only have two sentences and assuming you are looking for the second one:
    print Start_In_Common( 'This is a sentence', 'This is another sentence +' ); sub Start_In_Common { my ($first, $second) = @_; ($first, $second) = ($second, $first) if length $first > length $s +econd; my @word = ( [split /\s+/, $first ], [split /\s+/, $second ] ); my @fragment; for ( 0 .. $#{ $word[0] } ) { if ( $word[0][$_] eq $word[1][$_] ) { push @fragment, $word[1][$_]; } else { return join ' ', @fragment, scalar @fragment; } } }

    Cheers - L~R

Re: Number of matching words
by johnnywang (Priest) on Aug 14, 2004 at 22:39 UTC
    Just another try:
    use strict; my $a = "This is a sentence"; my $b = "This is another sentecne"; my @matches = match_words($a,$b); print join(" ",@matches),"\n"; print scalar(@matches); sub match_words{ my @word_list1 = split(/\s+/,$_[0]); my @word_list2 = split(/\s+/,$_[1]); my $i = 0; while(defined($word_list1[$i]) && defined ($word_list2[$i]) && $word_list1[$i] eq $word_list2[$i]){ ++$i; } return splice @word_list1, 0, $i; } __END__ This is 2
    updated: You need to use the match_words() in array context, if you use it in scalar context, it will return the last one matched (e.g., "is" in this example), this is due to the same behavior for splice. To solve this problem, you may want to replace the return statment in the sub by
    my @result = splice @word_list1,0,$i; return @result;
Re: Number of matching words
by tachyon (Chancellor) on Aug 15, 2004 at 02:00 UTC
Re: Number of matching words
by Anonymous Monk on Aug 14, 2004 at 19:47 UTC
    Thanks a lot for your help, both codes work great.
    btw excellent tutorial Limbic~Region

    Cheers