anonym has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to match the three columns (first three) of one file with the columns 0,3,4 of the second file using perl.My code so far is

#!usr/bin/perl use strict; use warnings; my $infile1 = $ARGV[0]; my $infile2 = $ARGV[1]; my $outfile = $ARGV[2]; open (INFILE1,"<", $infile1) || die "Cannot open $infile1:$!\n"; open (INFILE2, "<", $infile2) || die "Cannot open $infile2:$!\n"; open (OUTFILE, ">", $outfile) || die "Cannot open $outfile:$!\n"; my @array1; my @array2; my @array3; my @array4; my $_; while (<INFILE1>) { chomp; @array1 = split (' ', $_); push (@array2, "@array1\n"); #print "@array2\n"; } while (<INFILE2>) { chomp; @array3 = split (' ', $_); push (@array4, "@array3\n"); #print "@array4\n"; } #print "@array2\n"; #print "@array4\n"; foreach my $array2(@array2) { my @line = split(/\s+/,$array2); my $chr1 = $line[0]; my $start1 = $line[1]; my $end1 = $line[2]; #print "$line[0]\n"; foreach my $array4(@array4) { my @values = split(/\s+/, $array4); my $chr2 = $values[0]; my $start2 = $values[3]; my $end2 = $values[4]; if (($chr1 eq $chr2 ) && ($start1 eq $start2) && ($end1 eq $end2)) + { #print "$start2\n"; print "$chr2\t$start2\t$end2\n"; } } }

please help me with this code.Thanks.

  • Comment on Compare three columns of one file with three columns of another file in perl
  • Download Code

Replies are listed 'Best First'.
Re: Compare three columns of one file with three columns of another file in perl
by NetWallah (Canon) on May 26, 2015 at 01:07 UTC
    Here is some code to show you perl idioms on how to approach this problem.
    #!usr/bin/perl use strict; use warnings; my ($infile1 ,$infile2) = @ARGV; open (my $in1 ,"<", $infile1) or die "Cannot open '$infile1':$!\n"; open (my $in2, "<", $infile2) or die "Cannot open '$infile2':$!\n"; my @values; while ( <$in1> ){ chomp; my @one = split; next unless scalar(@one) >= 3; # Must have at least 3 push @values, [@one[0..2]]; } close $in1; while (<$in2>){ chomp; my @two = split; next unless scalar(@two) >= 5; # Must have at least 5 next unless grep { $_->[0] eq $two[0] and $_->[1] eq $two[3] and $_->[3] eq $two[4] } @values; print join("\t",@two[0,3,4]) , "\n" } close $in2;
    You will need to understand "array slices", and references.

            "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams

      Thanks but it does not print any output

        This line in my code has a typo .. should be corrected to meet your reaquirements :
        $_->[3] eq $two[4] # should be: $_->[2] eq $two[4]
        The code was intended as a style and structure guide, not necessarily a complete solution.

                "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams

Re: Compare three columns of one file with three columns of another file in perl
by BillKSmith (Monsignor) on May 26, 2015 at 00:10 UTC
    Your code looks more like "c" than perl. Nevertheless, with only minor changes, it could be made to work if the fields in your files consist of single characters separated by single spaces. I am not going to suggest such a solution because you would miss the opportunity to learn how to use character strings and arrays-of-arrays in perl. Perl's handling of character strings is probably its greatest advantage over other languages. For small data files, I would recommend reading each file into an array of records. Each record would be formed by splitting a line into an array of fields (much as you do already) and storing a reference to that array. You could then use nested loops to compare the arrays. The biggest advantage over your existing code is that you would be comparing strings (There is no need to split them into characters). Learn to use perl's built-in documentation. Refer to perldata for information on strings and perldsc (and it references) for information on arrays-of-arrays.
    Bill

      Hey Thanks.But it does not print any output.

Re: Compare three columns of one file with three columns of another file in perl
by aaron_baugher (Curate) on May 26, 2015 at 01:50 UTC

    Your requirements aren't clear to me. How do you want to compare the columns? Do all three have to match, and do they have to match respectively, or can they match in any order? (In other words, can "a, b, c" match "b, c, a"?) Or is it okay if only one matches? Can any line from one file match any line in the other file (I think this is your intention)? If so, what should it do when it finds a match? What if a line in one file matches multiple lines in the other file? Is that possible, and if so, what should be done?

    Work out what your requirements actually are and explain them as clearly as you can, preferably with some sample input and output data, and it will be easier for people to help you.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Hi Aaron, Thanks. They have to match in order.Like all three chr,start,end of one file should match the chr,start,end of second file.

        Aaron File infile few lines are chr10 40095550 40096075 chr10 40102275 40102575

        second infile few lines are chr1 mm10_knownGene exon 3205904 3207317 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1"; chr1 mm10_knownGene exon 3213439 3215632 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1";

        output should be for matching chr,start,end of first file with that of second file.