in reply to find common data in multiple files
Hello mao9856,
Since you are not telling us what is the problem e.g. the script is not running or it is not producing the desired output with a quick view we can not assist you.
A similar question parse multiple text files keep unique lines only was asked in the past and maybe you can find a possible solution to your problem that many Monks have tackled elegantly.
Update: I just tried to execute your sample of code, and it is not running. It looks you found the code somewhere you pasted here and did asked for someone to solve it for you. Can you show the minimum amount of effort that you tried to resolve it before and make the script executable?
Update 2: I had some time to kill so I put together this script that more or less does what you want. It reads all files from @ARGV and processes every line. Then it only keeps the lines that are in common. Assuming that lines are always the same and they are no combinations. By combinations I mean that you want only to detect duplicated lines.
Sample of code:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'duplicates'; my (@lines); while (<>) { next if /^\s*$/; # skip empty lines chomp; push @lines, $_; } continue { close ARGV if eof; # Not eof()! } my @dublicatedLines = duplicates @lines; print Dumper \@lines, \@dublicatedLines; __END__ $ perl test.pl File1.txt File3.txt $VAR1 = [ 'ID121 ABC14', 'ID122 EFG87', 'ID145 XYZ43', 'ID157 TSR11', 'ID181 ABC31', 'ID962 YTS27', 'ID567 POH70', 'ID921 BAMD80', 'ID121 ABC14', 'ID612 FLOW12', 'ID122 EFG87', 'ID745 KIDP36', 'ID145 XYZ43', 'ID157 TSR11' ]; $VAR2 = [ 'ID121 ABC14', 'ID122 EFG87', 'ID145 XYZ43', 'ID157 TSR11' ];
Update 2 continue: In case you want to detect uniquely lines that may contain only the $key or only the $value as duplicates, you can easily do it like this.
Sample of code:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'duplicates'; my (@keys, @values); while (<>) { next if /^\s*$/; # skip empty lines chomp; my ($key, $value) = split /\s+/; push @keys, $key; push @values, $value; } continue { close ARGV if eof; # Not eof()! } my @duplicatedKeys = duplicates @keys; my @duplicatedValues = duplicates @values; print Dumper \@keys, \@values, \@duplicatedKeys, \@duplicatedValues; __END__ $ perl test.pl File1.txt File3.txt $VAR1 = [ 'ID121', 'ID122', 'ID145', 'ID157', 'ID181', 'ID962', 'ID567', 'ID921', 'ID121', 'ID612', 'ID122', 'ID745', 'ID145', 'ID157' ]; $VAR2 = [ 'ABC14', 'EFG87', 'XYZ43', 'TSR11', 'ABC31', 'YTS27', 'POH70', 'BAMD80', 'ABC14', 'FLOW12', 'EFG87', 'KIDP36', 'XYZ43', 'TSR11' ]; $VAR3 = [ 'ID121', 'ID122', 'ID145', 'ID157' ]; $VAR4 = [ 'ABC14', 'EFG87', 'XYZ43', 'TSR11' ];
Update 2 continue: I used the module List::MoreUtils and more specifically the function List::MoreUtils/duplicates that "Returns a new list by stripping values in LIST occuring less than twice.". The DATA that I used are from the sample of DATA files that you provided us.
Hope this helps, BR.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: find common data in multiple files
by mao9856 (Sexton) on Dec 29, 2017 at 10:38 UTC | |
by thanos1983 (Parson) on Dec 29, 2017 at 14:12 UTC | |
by mao9856 (Sexton) on Dec 31, 2017 at 06:18 UTC | |
by afoken (Chancellor) on Dec 31, 2017 at 11:57 UTC | |
by thanos1983 (Parson) on Jan 02, 2018 at 09:49 UTC | |
by mao9856 (Sexton) on Jan 03, 2018 at 06:49 UTC | |
|