in reply to Re^2: CPAN Module to determing overlap of 2 lists?
in thread CPAN Module to determing overlap of 2 lists?

3 suggestions I haven't checked the last point since performance might not be your biggest issue.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^4: CPAN Module to determing overlap of 2 lists? (updated)
by LanX (Saint) on Aug 12, 2020 at 21:08 UTC
    > grow from right to left instead of shrinking from left to right

    This might be much faster if the overlaps are considerably smaller than the total files.

    And it avoids any semipredicate problem with $marker.°

    (Not heavily tested, please check edge-cases)

    use strict; use warnings; my $file1 = join "\n", qw( a b c d c ); my $file2 = join "\n", qw( c d c x ); my $content = "$file2\n$file1"; $content =~ /^(.*)\n.*\1$/s; (substr $file2,0,length $1)=$file1; print $file2;

    a b c d c x

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) unfortunately it doesn't, prove left to the interested reader

Re^4: CPAN Module to determing overlap of 2 lists?
by wazat (Monk) on Aug 13, 2020 at 00:50 UTC

    I added the line start anchor as I wanted to match whole lines.

    Agreed, assuming text files, a more "binary" marker is better.

    Currently I feel the regex solution is interesting, but still not my first choice. I'll dig deeper if I start profiling.

      > Currently I feel the regex solution is interesting, but still not my first choice. I'll dig deeper if I start profiling.

      For completeness.

      Another approach would be combining a binary search with string equality eq if your priorities tend towards performance.

      Of course you could change to a division factor other than 0.5 depending on heuristics.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery