Extracting common keys present in multiple files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have multiple files that are set up as:

name,number

I want to search all the files and print only the names that are present at least once in each of the files AND have at least 25 matches amongst the all files.

I am new to coding and not sure how to best go about this. Was thinking of making hash using the names as the keys but I'm not sure how to to be sure that the name is present all four files and has 25 matches amongst all of them

Any help would be greatly appreciated.

Comment on Extracting common keys present in multiple files

Replies are listed 'Best First'.
Re: Extracting common keys present in multiple files by GrandFather (Saint) on Oct 27, 2015 at 03:01 UTC
Show us what you have tried and we'll help you sort out the problem areas. It helps us a lot to help you if you provide a small test script we can run without having to mock up your test environment. See I know what I mean. Why don't you? for tips on how to generate any files you might need for the test script. Note that to test your solution you can probably get away with a couple of short files and only require 2 (instead of 25) "matches among all of them". Premature optimization is the root of all job security	[reply]
Re^2: Extracting common keys present in multiple files by Anonymous Monk on Oct 27, 2015 at 03:31 UTC
This is what I have so far, I know it is a mess. I'm not quite sure if this was the best way to incorporate the test files, but I put them at the bottom under DATA. `#!/usr/bin/env perl use strict; use warnings; my @files = shift foreach my $file (@files) { open my $fh, '<', $file or die "Couldn't read '$file': $!"; while (<$fh>) { my %names; my $name ; my $number; $name = $1; $name = split(',', $files); }; __DATA1___ A12345,23 A22334,100 A22789,44 A10923,89 __DATA2___ A89224,88 A12345,99 A78663,100 A10923,89 Expected output: given 2 matches present and at least one in each file A10923 A12345` [download]	[reply] [d/l]
Re: Extracting common keys present in multiple files by GotToBTru (Prior) on Oct 27, 2015 at 13:27 UTC
Somebody's in the same class as this guy. Or at least, the problems are remarkably similar. I suggest you check that thread as well for some ideas. Applefritter's just about written your homework for you. Dum Spiro Spero	[reply]
Re: Extracting common keys present in multiple files by tangent (Parson) on Oct 27, 2015 at 20:29 UTC
Here is a way to do it using two hashes, %n_count to keep track of the number of times each name appears, and %f_count to keep track of the files in which each occurs. `my (%f_count,%n_count); for my $file (@files) { open( my $fh, '<', $file ) or die "Couldn't read '$file': $!"; while (my $line = <$fh>) { chomp $line; next unless $line; my ($name,$number) = split(',',$line); $n_count{$name}++; $f_count{$name}{$file}++; } }` [download] Then you need to extract the names - first get the names that appear at least 25 times in the %n_count hash, then, for each of those candidate names, get the ones that appear in all files. `my $num_of_files = scalar @files; my $min = 25; my @candidates = grep { $n_count{$_} >= $min } keys %n_count; for my $name (@candidates) { my $in_files = scalar keys %{ $f_count{$name} }; next unless $in_files == $num_of_files; print "$name\n"; }` [download] The only tricky bit here is the `scalar keys %{ $f_count{$name} }` `$f_count{$name}` is a hash reference, where each key is a file name. We can get at the keys by dereferencing the hash `%{...}` and counting how many there are. If that count equals the number of files then that name is in every file.	[reply] [d/l] [select]