Re: find common data in multiple files

Hello mao9856,

Since you are not telling us what is the problem e.g. the script is not running or it is not producing the desired output with a quick view we can not assist you.

A similar question parse multiple text files keep unique lines only was asked in the past and maybe you can find a possible solution to your problem that many Monks have tackled elegantly.

Update: I just tried to execute your sample of code, and it is not running. It looks you found the code somewhere you pasted here and did asked for someone to solve it for you. Can you show the minimum amount of effort that you tried to resolve it before and make the script executable?

Update 2: I had some time to kill so I put together this script that more or less does what you want. It reads all files from @ARGV and processes every line. Then it only keeps the lines that are in common. Assuming that lines are always the same and they are no combinations. By combinations I mean that you want only to detect duplicated lines.

Sample of code:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils 'duplicates';

my (@lines);

while (<>) {
    next if /^\s*$/; # skip empty lines
    chomp;
    push @lines, $_;
} continue {
    close ARGV if eof;  # Not eof()!
}

my @dublicatedLines = duplicates @lines;

print Dumper \@lines, \@dublicatedLines;

__END__

$ perl test.pl File1.txt File3.txt
$VAR1 = [
          'ID121 ABC14',
          'ID122 EFG87',
          'ID145 XYZ43',
          'ID157 TSR11',
          'ID181 ABC31',
          'ID962 YTS27',
          'ID567 POH70',
          'ID921 BAMD80',
          'ID121 ABC14',
          'ID612 FLOW12',
          'ID122 EFG87',
          'ID745 KIDP36',
          'ID145 XYZ43',
          'ID157 TSR11'
        ];
$VAR2 = [
          'ID121 ABC14',
          'ID122 EFG87',
          'ID145 XYZ43',
          'ID157 TSR11'
        ];
[download]

Update 2 continue: In case you want to detect uniquely lines that may contain only the $key or only the $value as duplicates, you can easily do it like this.

Sample of code:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils 'duplicates';

my (@keys, @values);

while (<>) {
    next if /^\s*$/; # skip empty lines
    chomp;
    my ($key, $value) = split /\s+/;
    push @keys, $key;
    push @values, $value;
} continue {
    close ARGV if eof;  # Not eof()!
}

my @duplicatedKeys = duplicates @keys;
my @duplicatedValues = duplicates @values;

print Dumper \@keys, \@values, \@duplicatedKeys, \@duplicatedValues;

__END__

$ perl test.pl File1.txt File3.txt
$VAR1 = [
          'ID121',
          'ID122',
          'ID145',
          'ID157',
          'ID181',
          'ID962',
          'ID567',
          'ID921',
          'ID121',
          'ID612',
          'ID122',
          'ID745',
          'ID145',
          'ID157'
        ];
$VAR2 = [
          'ABC14',
          'EFG87',
          'XYZ43',
          'TSR11',
          'ABC31',
          'YTS27',
          'POH70',
          'BAMD80',
          'ABC14',
          'FLOW12',
          'EFG87',
          'KIDP36',
          'XYZ43',
          'TSR11'
        ];
$VAR3 = [
          'ID121',
          'ID122',
          'ID145',
          'ID157'
        ];
$VAR4 = [
          'ABC14',
          'EFG87',
          'XYZ43',
          'TSR11'
        ];
[download]

Update 2 continue: I used the module List::MoreUtils and more specifically the function List::MoreUtils/duplicates that "Returns a new list by stripping values in LIST occuring less than twice.". The DATA that I used are from the sample of DATA files that you provided us.

Hope this helps, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Comment on Re: find common data in multiple files Select or Download Code

Replies are listed 'Best First'.
Re^2: find common data in multiple files by mao9856 (Sexton) on Dec 29, 2017 at 10:38 UTC
Thank you for help. I tried to write this code based on my understanding. Please excuse me. I am very beginner of perl. My data contain unique ids (ID157) and name (TSR11) separated by tab. i want to look for both ids and name (ID157 TSR11) if they are present in all 25 files. If ID157 TSR11 is present in all 25 files, it should be printed in the output. This i want to print only those IDs and name that are present in all 25 files. and id and name should print together separated by tab as: ID157 TSR11. I am less familiar with using perl modules, but i am trying my best.	[reply]
Re^3: find common data in multiple files by thanos1983 (Parson) on Dec 29, 2017 at 14:12 UTC
Hello again mao9856, Not knowing it is not a problem, nobody started coding and knew everything. This forum is open and free for people to learn and contribute. You provide us a bit of a non working code. You either got it from someone or you modified it to the point it was not working. It is good for you and all of us to practice and try to provide a working sample of code that shows what you have tried and where you got stuck. For us to provide you a solution it is really easy but it would not mean that it resolves your problem since you will not learn anything out of it. Having said that here is a sample of code that it does what you want. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'frequency'; my (@lines); my $numberOfFiles = scalar @ARGV; while (<>) { next if /^\s*$/; # skip empty lines (remove if not needed) chomp; push @lines, $_; } continue { close ARGV if eof; # Not eof()! } my @frequencyLines = frequency @lines; my %frequencyHash = @frequencyLines; my @unwanted; foreach my $key (keys %frequencyHash) { if ($frequencyHash{$key} != $numberOfFiles) { push @unwanted, $key; # push any related keys onto @unwanted } } delete @frequencyHash{@unwanted}; my @matches = keys %frequencyHash; print Dumper \@matches; __END__ $ perl test.pl File1.txt File2.txt File3.txt $VAR1 = [ 'ID122 EFG87', 'ID121 ABC14', 'ID157 TSR11' ]; [download] I used as input files the first 3 files (File 1 -3) as input DATA that you provided us. Hope this helps, BR. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re^4: find common data in multiple files by mao9856 (Sexton) on Dec 31, 2017 at 06:18 UTC
Hi BR, I tried this code using it as follow: `$ perl test.pl *.txt` [download] And it is giving output: `$VAR1 = [];` Please help me understand how should it work. Thank you in advance	[reply] [d/l] [select]
Re^5: find common data in multiple files by afoken (Chancellor) on Dec 31, 2017 at 11:57 UTC
Re^5: find common data in multiple files by thanos1983 (Parson) on Jan 02, 2018 at 09:49 UTC
Re^6: find common data in multiple files by mao9856 (Sexton) on Jan 03, 2018 at 06:49 UTC
Some notes below your chosen depth have not been shown here