snape has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Can someone suggest how to do the comparison of two rows using a definite window size i.e. if I have two rows with 100 columns I prefer to do the comparison of the rows from 1 to 10 and then 2 to 11 and then 3 to 12 until I get 90-100. Can somebody suggest as how should I do that.

Thanks.
  • Comment on Comparison of the row by using window size

Replies are listed 'Best First'.
Re: Comparison of the row by using window size
by BrowserUk (Patriarch) on Jan 27, 2010 at 01:02 UTC

    Assuming your rows are compacted to strings:

    my $row1 = pack 'C*', map 1+int( rand 2 ), 1..100; my $row2 = pack 'C*', map 1+int( rand 2 ), 1..100; substr( $row1, $_, 10 ) eq substr( $row2, $_, 10 ) and print "Found match at offset; $_" for 0 .. length( $row1 ) - 10;

      Hi Browser,

      Can you please explain me the code and difference between the two lines as both of the same seems to be the same statement:

      my $row1 = pack 'C*', map 1+int( rand 2 ), 1..100; my $row2 = pack 'C*', map 1+int( rand 2 ), 1..100;

      Thanks.

        The first line is $row1 and the second line is $row2.
        Its building two different random strings for the algorithm to search through.
Re: Comparison of the row by using window size
by ikegami (Patriarch) on Jan 26, 2010 at 23:18 UTC
    my @rowa = $rowa =~ /.{1,10}/g; my @rowb = $rowb =~ /.{1,10}/g; for my $i (max(0+@rowa, 0+@rowb)) { ... }

    I misread the question.

      The OP wanted to compare 10 columns at a time with a sliding window, i.e. columns 1 - 10 then 2 - 11 etc. I think your code will give columns 1 - 10 then 11 - 20 etc.

      $ perl -E ' > $rowTxt = ( q{0123456789} x 2 ) . q{0123}; > @groups = $rowTxt =~ m{.{1,10}}g; > say for @groups;' 0123456789 0123456789 0123 $

      Perhaps a look-ahead would give the sliding window the OP wanted.

      $ perl -E ' > $rowTxt = ( q{0123456789} x 2 ) . q{0123}; > @groups = $rowTxt =~ m{(?=(.{10}))}g; > say for @groups;' 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 $

      I'm not sure why the OP is going about the comparison this way. Perhaps s/he wants to pin down exactly where the strings differ. If so, XOR'ing the rows and detecting the position of any ones might be more efficient.

      $ perl -E ' > $sa = q{abcdefg}; > $sb = q{abbdegg}; > $df = $sa ^ $sb; > say q{Zero-based counting}; > 1 while $df =~ m{\x01(?{ say pos( $df ) - 1 })}g; > say q{One-based counting as per OP example}; > 1 while $df =~ m{\x01(?{ say pos $df })}g;' Zero-based counting 2 5 One-based counting as per OP example 3 6 $

      I hope this is of interest.

      Cheers,

      JohnGG

        Thanks for giving me an interesting way to work with the sliding window. Yeah, I would like to pin down the string where it is not matching and using XOR will really be efficient.