A few points:
- You need () around the whole "if statement" clause as hippo pointed out.
- Your code will run very slowly because it reads the complete file2 again and again for
every line in file1. If file2 is big, this will make a significant difference.
- Consider reading one of the files into memory to prevent a lot of slow file system I/O. A hash based data structure for that memory data will also speed things up considerably vs a linear search.
- Consider using a split on /\s+/ or ' ' instead of \t. That splits on a sequence of one or more
white space characters. Those include the
\t, \n and actual space characters, so a chomp is not needed after a split like that. Also if you get a file
that has actual spaces instead of the tabs, code will still work.
- Consider using "use strict" and "my" variables. That will give additional compile error
info that is helpful. But the code "as is" produces the appropriate error message.
- Consider indenting the code to show the "levels" better. What you have is hard to read.
Update:
You say " "last;" is also not working here..". I don't see any "last;" statement in the code, should be fine if put in the right place. Step 1: get your code to compile.
I suggest your read chomp doc to understand what chomp() does. If you insist on splitting on \t, chomp the input line first as the doc's suggest.
Update2: Some code:
I can tell that you are beginner at Perl and because this doesn't look like
homework, I wrote some code for you that incorporates my advice
above. I hope some actual code is easier to understand than general advice. Please
play with this and adapt it to your needs.
It is possible to make an "in memory" data structure of either file1 or file2.
In this case, I picked file 1 and generated a hash table from it. The keys of this
"file 1 hash table" are like "chr17:69112551" and the value of each key like that
is set to "1" although that "1" value is never used in my code.
From looking at your code, it appears that the desired output is one line for each
line in file 2.
In your code, if ($ary[0] eq $any[0] and $ary[1] == $any[1])
has been transformed into: if ( exists ($file1_hash{"$any[0]:$any[1]"})). Using
a combined hash key like this expresses the "and" function. Then there is another
condition for the "or" function.
The net effect of code like this is that each line in file 1 or file 2 is only
read once. File I/O is "expensive" in terms of CPU power. Every line in file 1 is
read and a hash table created. Then for each line in file 2, the line is read, parsed and
a decision is reached based upon the result of 2 look-up statements into the file 1
hash. These hash look-ups are very efficient and scale to very big files.
I couldn't see any way to get an "E" with your test data, so I added some extra
data to my test cases. In the future, it is best if you can provide an example
"desired output" that demo's the basic decisions which need to be made.
Have fun. Ask questions if you don't understand. If I made a mistake and didn't
understand something, asking about that is fine too. Oops, just had a thought that you wanted an output line per line of file 1 instead of file 2? In that case, code changes, possibly make a HoA, Hash of Array out of file 2 to start off.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $file1 = <<END;
chr17 69112551
chr1 67058869
chr7 151046672
chr7 151047369
chr1 66953654
END
my $file2 = <<END;
chr1 66953622 66953654
chr1 67200451 67200472
chr1 67200475 67200478
chr1 67058869 67058880
chr1 67058881 67058885
chr1 67058887 67058895
END
open my $infile1, '<', \$file1 or die "unable to open first file $!";
open my $infile2, '<', \$file2 or die "unable to open 2nd file $!";
### create memory structure of file 1:
### so that we only have to read file1 once!
#
my %file1_hash;
while (my $line = <$infile1>)
{
next if $line =~ /^\s*$/; #skip blank lines (a common infile goof
+)
my ($key, $value) = split /\s+/, $line; # use better "names" I have
# no idea of what a chr col
+ means
$file1_hash{"$key:$value"} = 1;
}
close $infile1; # file handle closure is optional, but I'd do it.
### process each line in file2:
### If a line "matches" with any line in file1, then "E", else "M"
### I don't know that these numbers mean, come up with better comment
+.
while (my $line = <$infile2>)
{
chomp $line; #so that output with E or M can be on same line
next if $line =~ /^\s*$/; #skip blank lines (a common infile goof
+)
my ($chr, $val1, $val2) = split /\s+/,$line;
if ( exists $file1_hash{"$chr:$val1"} or
exists $file1_hash{"$chr:$val2"} )
{
print "$line\tE\n"; # match exists with file 1
}
else
{
print "$line\tM\n"; # match does NOT exist with file 1
}
}
__END__
Prints the following:
chr1 66953622 66953654 E
chr1 67200451 67200472 M
chr1 67200475 67200478 M
chr1 67058869 67058880 E
chr1 67058881 67058885 M
chr1 67058887 67058895 M
|