Extract common lines from 2 files

great_riyaz has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,
I am a QA trying to use perl to achieve following:
I have 2 files, both contain test names (contains .tst), and some more lines with other details for each test. I want to find out test name entries common to both files, and store it in array to be used later.
I can do this in Unix using grep -Fx. But it fails in windows as hamilton grep does not have this option. I wish to achieve this in perl so that it will run ok everywhere.
code snippet:

$file1 = "./data/list1.txt";
$file1 = "./data/list2.txt";
@common_lines = `grep -Fx -f $file1 $file2 | grep .tst`;
[download]

I need help with perl code to fill in @common_lines.

contents of list1 could be like :

test1.tst
not_run
test3.tst
time:2 sec
test6.tst
time:2 sec
[download]

contents of list2 could be like :

test2.tst
time:3 sec
test3.tst
time:6 sec
test1.tst
time:5 sec
[download]

...upto 100000 entries.
so then @common_lines should contain {test1.tst, test3.tst, ...}
Any help / suggestion is appreciated!

Comment on Extract common lines from 2 files Select or Download Code

Replies are listed 'Best First'.
Re: Extract common lines from 2 files by jwkrahn (Abbot) on May 10, 2012 at 08:09 UTC
`my @common_lines = do { local @ARGV = ( 'list1.txt', 'list2.txt' ); my %data; while ( <> ) { next unless /\.tst$/; if ( @ARGV ) { $data{ $_ } \|= 1; } else { $data{ $_ } \|= 2; } } grep $data{ $_ } == 3, keys %data; };` [download]	[reply] [d/l]
Re: Extract common lines from 2 files by Marshall (Canon) on May 10, 2012 at 10:05 UTC
A bit different.... #usr/bin.perl -w use strict; my $file1=<<END; test1.tst not_run test3.tst time:2 sec test6.tst time:2 sec test10009.tst timeL 39 sec END my $file2=<<END; test2.tst time:3 sec test3.tst time:6 sec test1.tst time:5 sec END open FILE1, '<', \$file1 or die "cannot open file1 $!"; open FILE2, '<', \$file2 or die "cannot open file2 $!"; my %seenFile1; while (<FILE1>) { my ($file_name) = $_ =~ (/^\s(test\d\.tst)\s/); $seenFile1{$file_name} = 1 if $file_name; } close FILE1; my @common; while (<FILE2>) { my ($file_name) = $_ =~ (/^\s(test\d\.tst)\s/); #as OP wants, save common names for other uses.... # push @common, $file_name if $seenFile1{$file_name}; } #one use is sort # foreach (sort {my ($Anum) = $a =~ /(\d+)/; my ($Bnum) = $b =~ /(\d+)/; $Anum <=> $Bnum }@common ) { print "$_\n"; } __END__ test1.tst test3.tst [download]	[reply] [d/l]
Re: Extract common lines from 2 files by great_riyaz (Initiate) on May 11, 2012 at 09:38 UTC
Solution by jwkrahn worked for me. Thanks to everyone for reply!	[reply]
Re: Extract common lines from 2 files by Anonymous Monk on May 10, 2012 at 20:16 UTC
I can do this in Unix using grep -Fx. But it fails in windows as hamilton grep does not have this option. Ditch "hamilton grep" asap, use GNU grep. http://gnuwin32.sourceforge.net/packages/grep.htm http://sourceforge.net/projects/unxutils/files/	[reply]
Re^2: Extract common lines from 2 files by Anonymous Monk on May 10, 2012 at 20:23 UTC
There is even http://search.cpan.org/~cwest/ppt-0.14/bin/grep	[reply]
Re^3: Extract common lines from 2 files by Hellhound4 (Novice) on May 11, 2012 at 00:02 UTC
I had a very similar question the other day. You can find my complete code on the last post of this thread. http://www.perlmonks.org/?node_id=968493 The cpan utility I used is here. (though there is a way to use grep) http://search.cpan.org/dist/Array-Utils/Utils.pm Basically you need to open each file and read them into separate arrays. Then cycle through and find the matches.	[reply]