Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!
I have 10 files that have names inside them (more than 200 each file) and I need to find the common subset of names in all files.
Is there any quicker way than using diff or compare each file to the others?
Thanks!

Replies are listed 'Best First'.
Re: Find common lines in multiple files?
by Corion (Patriarch) on Dec 15, 2010 at 13:13 UTC

    Yes.

    The most relevant thing to realize about the intersection of n sets is, that intersection is an associative (and also commutative) operation, so that for computing

    A ∩ B ∩ C ∩ D ...

    , you can simply compute iteratively

    (((A ∩ B) ∩ C) ∩ D) ...

    So all that's needed is to find a way to find the intersection of two sets. This is a faq, see perlfaq4, How do I compute the intersection of two arrays?.

    Reading the information into memory and outputting the result is left as an exercise to the reader.

Re: Find common lines in multiple files?
by BrowserUk (Patriarch) on Dec 15, 2010 at 13:32 UTC

    It could be as simple as the following one-liner;

    perl -nle"push @{$h{$_}},$ARGV}{print qq[$_ found in @{$h{$_}}] for keys %h +" 1.dat 2.dat 3.dat 4.dat 5.dat 6.dat 7.dat 8.dat 9.dat 10.dat

    Which produces a list like this:

    name074 found in 1.dat 6.dat name027 found in 5.dat name002 found in 1.dat 2.dat 3.dat name117 found in 2.dat 3.dat 4.dat name110 found in 6.dat name160 found in 5.dat 9.dat name079 found in 5.dat 7.dat name051 found in 1.dat name022 found in 2.dat 3.dat name100 found in 2.dat 7.dat name061 found in 6.dat ...

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      How do I run your code? Is it possible to put this in a .pl file form?
        How do I run your code?

        If you're on windows:

        c:\>perl -nle"push @{$h{$_}},$ARGV}{print qq[$_ found in @{$h{$_}}] fo +r keys %h" 0.dat 1.dat 2.dat 3.dat 4.dat 5.dat 6.dat 7.dat 8.dat 9.da +t

        On *nix this might work:

        $perl -nle'push @{$h{$_}},$ARGV}{print "$_ found in @{$h{$_}}" for key +s %h' ?.dat
        Is it possible to put this in a .pl file form?

        Yes:

        #! perl -nl push @{ $h{ $_ } }, $ARGV }{ print qq[$_ id in @{$h{ $_ }}] for keys %h;

        Assuming your files are name 0.dat, 1.dat etc.

        On windows:

        c:\>theScript.pl 0.dat 1.dat 2.dat 3.dat 4.dat 5.dat 6.dat 7.dat 8.dat + 9.dat

        On *nix, something like:

        $perl theScript.pl ?.dat

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.