comment on

Hello again corcra,

Looks like a number of things have changed in your specification. First, the input file headers (but not the data) have lost their square brackets and quotation marks. Second, the criteria for generating the output have changed. From the original post:

I am trying to write a code which prints out file 1 again but if the sample value is not 'REF', looks up file 2. If the corresponding file 2 value is 'REF' then print the original value appearing in file 1. If the corresponding value in file 1 is not 'REF' then print the value we find in file 2.

But now you say:

the same letter at the same position in File 1 and File 2 should output REF, and if REF is found at that position in file 2, just output letter from File 1

Third, the numbering of input file column headers no longer begins at 1. Will these numbers always increase in value from left to right? Probably safer to assume not, and to formulate a more general solution. In the following, I have simplified the input file formats by removing the square brackets and commas from the data as well as the headers:

File 1:

CHROM POS  REF ALT VARIANT_LIST 209T-D 459T-D 644T-D 94T-D 99T1-D 99T2
+-D 99T3-D 99T4-D 99T5-D
MT    1010 G   A   A            REF    A      A      REF   A      A   
+   A      A      A
MT    2962 C   T   T            REF    T      T      T     T      T   
+   T      T      T
[download]

File 2:

CHROM POS  REF ALT VARIANT_LIST 209H-D 459H-D 644H-D 94H-D 99H-D
MT    1010 G   A   A            REF    REF    REF    REF   REF
MT    2962 C   T   T            REF    REF    T      REF   T
[download]

The script:

#! perl
use strict;
use warnings;

my $file1 = 'File_1b.txt';      # shift;
my $file2 = 'File_2b.txt';      # shift;

open(my $in1, '<', $file1)
    or die "Cannot open file '$file1' for reading: $!";

open(my $in2, '<', $file2)
    or die "Cannot open file '$file2' for reading: $!";

chomp(my $header1 = <$in1>);
chomp(my $header2 = <$in2>);

my @heads1  = split /\s+/, $header1;
my @heads2  = split /\s+/, $header2;
my $low_idx = 5;
my %index_map;

for my $i ($low_idx .. $#heads1)
{
    $index_map{$i} = undef;
    my ($num1)     = $heads1[$i] =~ /^(\d+)/;
    next unless defined $num1;

    for my $j ($low_idx .. $#heads2)
    {
        my ($num2) = $heads2[$j] =~ /^(\d+)/;

        if (defined $num2 && $num1 == $num2)
        {
            $index_map{$i} = $j;
            last;
        }
    }
}

print $header1, "\n";

while (my $line1 = <$in1>)
{
    chomp $line1;
    my @fields1 = split /\s+/, $line1;

    defined(my $line2 = <$in2>)
        or die "Data missing in file '$file2': $!";

    chomp $line2;
    my @fields2 = split /\s+/, $line2;
    my @out     = @fields1;

    for my $i ($low_idx .. $#fields1)
    {
        my $j = $index_map{$i};

        if (       $fields1[$j] ne 'REF' &&
            exists $fields2[$j]          &&
                   $fields2[$j] ne 'REF')
        {
            if ($fields1[$j] eq $fields2[$j])
            {
                $out[$i] = 'REF';
            }
            else
            {
                $out[$i] = $fields2[$j];
            }
        }
    }

    print join(' ', @out), "\n";
}

close $in2
    or die "Cannot close file '$file2': $!";

close $in1
    or die "Cannot close file '$file1': $!";
[download]

Output:

23:33 >perl 959b_SoPW.pl
CHROM POS  REF ALT VARIANT_LIST 209T-D 459T-D 644T-D 94T-D 99T1-D 99T2
+-D 99T3-D 99T4-D 99T5-D
MT 1010 G A A REF A A REF A A A A A
MT 2962 C T T REF T REF T REF REF REF REF REF

23:33 >
[download]

Now some general advice:

In programming, nailing down the specification is half the battle. And by “nailing down” I mean specifying precisely. If you can’t write down exactly what the programme is supposed to do, and how it is supposed to do it, you don’t properly understand what you are doing, which makes it (a) unlikely that you will succeed, and (b) harder to get help. Prepare a table of inputs and expected outputs, then go over all the possible input permutations and ensure that your table specifies the correct outcome for every case.
I hope the above script is useful, but it won’t help you much in the long run if you don’t make an effort to understand what the code is doing, and where and why it has changed to meet the changed requirements. In the future, it will be better if you ask specific questions about the code, after showing that you’ve made a concerted effort to solve your own problem first.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re^5: Can't access data stored in Hash - help! by Athanasius
in thread Can't access data stored in Hash - help! by corcra

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.