Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

this is my code for comparing 2 files and checking if the lines match..
#!usr/bin/perl -w use strict; use autodie; my @human = 'C:\strawberry\perl\bin\humanpathway.txt'; open(HUMAN, "<", @human); my @hum =<HUMAN>; my @output = 'C:\strawberry\perl\bin\output.txt'; open(OUTPUT, ">", @output); my @output =<OUTPUT>; my@bact = 'C:\strawberry\perl\bin\bactpathway.txt'; open(BACT, "<", @bact); my @bact =<BACT>; my $hlindex= $#hum; my $blindex= $#bact; for (my $i=0; $i<=$hlindex; $i++) { chomp $hum[$i]; my@arr1 = split ("\t",$hum[$i]); my$flag=0; for (my $j=0; $j<=$blindex; $j++) { chomp $bact[$j]; my @arr2= split ("\t",$bact[$j]); if ( $arr1[1] eq $arr2[1] && $arr1[2] eq $arr2[2] && $arr1[3] eq $arr2 +[3]) { if ($flag==0) { print OUTPUT "$hum[$i]\n"; my $flag=1; } print OUTPUT "$bact[$j]\n"; } } } close HUMAN; close BACT; close OUTPUT;
the error i get while running is: Use of uninitialised value in string eq at (filepath) <BACT> line 17 I can't seem to figure out what is the real problem..can someone help?

Replies are listed 'Best First'.
Re: comparing data btwn 2 files
by anonymized user 468275 (Curate) on May 11, 2011 at 10:47 UTC
    I would start by testing the return value of your open statement. Is there a reason for passing an array for the filepath to open?
    open(HUMAN, "<", @human) or die "$!: $human[0]";
    ;

    One world, one people

Re: comparing data btwn 2 files
by Anonymous Monk on May 11, 2011 at 11:05 UTC
    C:\strawberry\perl\bin\*.txt

    Using C:\strawberry\perl\bin\ to store input/output files is like using C:\Program Files or C:\WINDOWS for the same, bad idea all around.

    You should use "My Documents" or another dedicated directory

    That you're able to write to C:\strawberry... hints that you are logged in as an administrator, as a rule, you should do your daily work from a limited account (I'm assuming you're not using no-login setup)

Re: comparing data btwn 2 files
by bart (Canon) on May 11, 2011 at 10:46 UTC
    A quick note:
    my $flag=1;
    You now have 2 variables with the name $flag. That'll probably not do what you want. (Hint: drop the my)

    And assuming line 17 is

    if ( $arr1[1] eq $arr2[1] && $arr1[2] eq $arr2[2] && $arr1[3] eq $arr2 +[3])
    I'm guessing your strings don't actually have 4 tab separated parts (at least for some lines; or perhaps the files don't have the same length).

    "4"? Yes, array indexes start at 0, so you're skipping the first array item, with index 0.

Re: comparing data btwn 2 files
by tospo (Hermit) on May 11, 2011 at 12:09 UTC
    why do you split this on tabs? If you want to compare the whole line you can compare the lines as in "if ($line_from_hum eq line_from_bact){...}" or something. If you need to extract only a part of the line (say the first three fields) you could use a regex as in:
    my ($part_of_line) = ($line=~/^((?:\S+\t){3})/);
    One problem in the logic of your code might be that you are comparing lines only from "the point of view" of the human file. Lines in bact that are not in human will not be captured at the moment, which may or may not be what you wanted.
      ^^my data in the .txt file is something like this:
      Bacteroides thetaiotaomicron C00267 5.3.1.9 C00668 Bacteroides thetaiotaomicron C00221 5.3.1.9 C01172 Bacteroides thetaiotaomicron C00074 5.3.1.9 C00022 Bacteroides thetaiotaomicron C00221 5.3.1.1 C00267 Parabacteroides Distasonis C02876 2.7.2.1 C00163
      Homosapiens C00267 2.7.1.2 C00668 Homosapiens C00267 5.1.3.3 C00221 Homosapiens C00221 5.1.3.3 C00267 Homosapiens C00221 2.7.1.2 C01172 Homosapiens C00221 2.7.1.1 C01172 Homosapiens C00668 5.3.1.9 C01172 Homosapiens C01172 5.3.1.9 C00668 Homosapiens C00668 5.3.1.9 C05345 Homosapiens C05345 5.3.1.9 C00668
      i want to compare the second, third and fourth columns in the 2 files..and find the matches.
        ok, if yuo only need the matches and don't need to know which elements are only in the bacterium or only in human then your approach will be sufficient.