in reply to Re: Matching complementary base pairs from 2 different files
in thread Matching complementary base pairs from 2 different files

Yes, you are absolutely right!

no the asterisks are not significant, they were just to highlight the area of focus

and yes "the 6 results is all I want!!" I shall work on the idea and be careful about non-biologists.
  • Comment on Re^2: Matching complementary base pairs from 2 different files

Replies are listed 'Best First'.
Re^3: Matching complementary base pairs from 2 different files
by Athanasius (Archbishop) on Jan 01, 2018 at 13:16 UTC

    In that case, the following should be close to what you’re looking for:

    use strict; use warnings; open(my $f1, '<', $ARGV[0]) or die "Could not open '$ARGV[0]' for reading, stopped"; open(my $f2, '<', $ARGV[1]) or die "Could not open '$ARGV[1]' for reading, stopped"; my @arr1 = <$f1>; my @arr2 = <$f2>; close $f1 or die "Could not close '$ARGV[0]', stopped"; close $f2 or die "Could not close '$ARGV[1]', stopped"; my %hash; for my $element (@arr2) { push @{ $hash{$1} }, $2 if $element =~ m{ ^ \s* (rs\d+) \s+ \** (\w{2}) }x; } my $i = 0; for my $element (@arr1) { if (my ($rsid, $base1, $description) = $element =~ m{ ^ \s* (rs\d+) \s+ \** (\w{2}) \** \s+ (.*) $ }x +) { if (exists $hash{$1}) { for my $base2 (@{ $hash{$1} }) { print "$rsid $base1 $base2 $description\n"; ++$i; } } } } print "Found $i matches\n";

    Output:

    23:14 >perl 1852_SoPW.pl file1 file2 rs492602 CC GG Vitamin B12 deficiency FUT2 Higher levels of vita +min B12 rs492602 CC CC Vitamin B12 deficiency FUT2 Higher levels of vita +min B12 rs492602 CT GG Vitamin B12 deficiency FUT2 Normal levels of vita +min B12 rs492602 CT CC Vitamin B12 deficiency FUT2 Normal levels of vita +min B12 rs492602 TT GG Vitamin B12 deficiency FUT2 Normal levels of vita +min B12 rs492602 TT CC Vitamin B12 deficiency FUT2 Normal levels of vita +min B12 Found 6 matches 23:14 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,