Re: find common data in multiple files

G'day mao9856,

I'd read through one file and store all of its data in a hash; then read through the remaining files, removing hash data that wasn't common. Given these files (in the spoiler) using data from your OP:

$ cat pm_1206312_in1
ID121 ABC14
ID122 EFG87
ID145 XYZ43
ID157 TSR11
ID181 ABC31
ID962 YTS27
ID567 POH70
ID921 BAMD80
[download]

$ cat pm_1206312_in2
ID111 RET61
ID157 TSR11
ID181 ABC31
ID962 YTS27
ID452 FYU098
ID121 ABC14
ID122 EFG87
[download]

$ cat pm_1206312_in3
ID121 ABC14
ID612 FLOW12
ID122 EFG87
ID745 KIDP36
ID145 XYZ43
ID157 TSR11
[download]

$ cat pm_1206312_in25
ID122 EFG87
ID809 EYE24
ID157 TSR11
ID921 BAMD80
ID389 TOP30
ID121 ABC14
[download]

This code:

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my @files = glob 'pm_1206312_in*';
my %uniq;

{
    open my $fh, '<', shift @files;

    while (<$fh>) {
        my ($k, $v) = split;
        $uniq{$k} = $v;
    }
}

for my $file (@files) {
    my %data;
    open my $fh, '<', $file;

    while (<$fh>) {
        my ($k, $v) = split;
        $data{$k} = $v;
    }

    for (keys %uniq) {
        delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat
+a{$_};
    }
} 

printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;
[download]

Produces this output:

ID121 ABC14
ID122 EFG87
ID157 TSR11
[download]

— Ken

Comment on Re: find common data in multiple files Select or Download Code

Replies are listed 'Best First'.
Re^2: find common data in multiple files by mao9856 (Sexton) on Dec 30, 2017 at 08:32 UTC
Hi Ken This code worked for me after I put last line: printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq; before closing parenthesis. Thanks a million:)	[reply]
Re^3: find common data in multiple files by kcott (Archbishop) on Dec 31, 2017 at 01:51 UTC
"This code worked for me after I put last line ... before closing parenthesis. Thanks a million" Whilst I appreciate the thanks, it sounds like you've introduced a (possibly subtle) bug. The basic logic for my code is: `Declare hash SINGLE BLOCK (reading one file): Populate hash LOOP BLOCK (reading all other files): Remove data that isn't common from hash Print hash data` [download] If you move the `Print` operation to LOOP BLOCK, you'll get multiple (24) groups of output. That's not what you want, and it would have been plainly obvious if you'd done that, so you've probably done something different to what you've described. You've said "I am very beginner of perl" in a couple of places. I suspect you haven't understood the anonymous block I used in SINGLE BLOCK and ended up with logic more like this: `Declare hash start SINGLE BLOCK Populate hash LOOP BLOCK Print hash data end SINGLE BLOCK` [download] An anonymous block is just code wrapped in braces: `{ # code here }` [download] I've used it to provide a limited lexical scope. The variables (`$fh`, `$k` and `$v`) that I've declared in that block, only exist in that block; they are quite different to, and cannot interfere in any way with, the similarly named variables elsewhere in the code. There's also an additional benefit: when `$fh` goes out of scope, Perl performs an implicit close. Anyway, while that's probably useful information you can add to your "beginner of perl" knowledgebase, it's very much guesswork on my part with respect to whatever modifications you made to my original code. If you post your changes, I can provide more concrete feedback. — Ken	[reply] [d/l] [select]
Re^4: find common data in multiple files by mao9856 (Sexton) on Dec 31, 2017 at 06:09 UTC
I am very grateful for all the useful explanations you have provided. As you know, I am very beginner of perl, i tried to modify your provided code, because it didn't worked for me. I can see the logic for your code. But please let me ask you something. Following of your code isn't giving any output when i use it for 25 files. Please tell me how to fix it. `#!/usr/bin/env perl use strict; use warnings; use autodie; my @files = glob 'pm_1206312_in*'; my %uniq; { open my $fh, '<', shift @files; while (<$fh>) { my ($k, $v) = split; $uniq{$k} = $v; } } for my $file (@files) { my %data; open my $fh, '<', $file; while (<$fh>) { my ($k, $v) = split; $data{$k} = $v; } for (keys %uniq) { delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat +a{$_}; } } printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;` [download]	[reply] [d/l]
Re^5: find common data in multiple files by kcott (Archbishop) on Jan 01, 2018 at 01:01 UTC
Re^5: find common data in multiple files by poj (Abbot) on Dec 31, 2017 at 09:18 UTC
Re^6: find common data in multiple files by mao9856 (Sexton) on Jan 01, 2018 at 05:08 UTC
Some notes below your chosen depth have not been shown here