goingcrazy has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,
I am an amateur in PERL trying to write a script for matching elements of array but there is slight glitch... Please help me.
I have one file consisting of ids... The other file consisting of ids and a sequence of letters associated with those ids.
eg. File A : 1DWK 2RFK 4ERH and so on...
File B: 1DWK
HRSDKKDAHJKLSDLDLLJDGHDFJJE
4ERH
DFSKFHADFSBVHFWIHFWJBFS
2RFK
DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB
I have written a script to match ids in both the files and return the ids with sequence as result:
#! usr/bin/perl #this script is to match the two files and print the matched content i +n the two files $p="FILEA"; open (FILE1,$p); @array =<FILE1>; #print "@array"; $p1="FILEB"; open (FILE2,$p1); @new = <FILE2>; #print "@new"; foreach $line (@array) { chomp $line; open (OUT,">nrset.txt"); for($i=0;$i<@new;$i++) { chomp $new[$i]; # print "$new[$i]\n"; if ($new[$i] =~ /$line/i) { $pos = $i; print "$pos\n"; print OUT "$new[$i]\n$new[$i+1]\n"; # print "$new[$i]\n$new[$i+1]\n"; } }}
This script just runs creating a nrset.txt file which is empty but doesn't show any matches. While I supposed my File having ids with sequence is too huge, tried to take few checkfiles which gave expected results as my script should do. Help me fellas!
Thnx in adv.
  • Comment on Matching elements in two arrays and printing the element next to the match.
  • Download Code

Replies are listed 'Best First'.
Re: Matching elements in two arrays and printing the element next to the match.
by Corion (Patriarch) on Mar 22, 2010 at 08:49 UTC

    You don't do any error checking and you don't use warnings which would allow Perl to alert you of unopened files. When opening files, use the following:

    ... open (FILE1,$p) or die "Couldn't open file '$p': $!"; ...

    Also, your approach will be very (very) slow, because for each entry in the first file you will go through all lines of the second file. It will likely be better to store the entries from the smaller file in a hash and then go through the larger file once and compare the entries to it.

      Barewords for file handles? Two arg form of open()? How about some modern perl:

      open my $ID_FILE, '<', 'ids.txt' or die "Couldn't open ids.txt: $!";
        You forgot the smiley
Re: Matching elements in two arrays and printing the element next to the match.
by murugaperumal (Sexton) on Mar 22, 2010 at 08:53 UTC
    use strict; use warnings; #this script is to match the two files and print the matched content i ++n the two files open (FILE1,"<A") or die "Can't Open $!"; my $var=<FILE1>; my @array=split(' ',$var); open (FILE2,"<B") or die "Can't Open $!"; my @new; while(<FILE2>) { chomp; push(@new, $_); } my $i; my $line; open(OUT,">nrset.txt"); foreach $line (@array) { chomp $line; for($i=0;$i<$#new;$i++) { chomp $new[$i]; if ($new[$i]=~/$line/i) { print OUT "$new[$i]\n$new[$i+1]\n"; } } }
Re: Matching elements in two arrays and printing the element next to the match.
by spazm (Monk) on Mar 22, 2010 at 15:34 UTC

    Caveats:

    I am not certain of your file formats. Here are my assumptions:
    1. For FILE A, it is unclear if you have multiple ID strings on one line, or one per line over multiple lines. I'll make this solution work with either.
    2. For FILE B, are your lines "key <whitespace> letters" or "key <newline> letters <newline>", ie key and values on separate lines?

    Solution:

    Perl hashes are very robust and often a great solution for simple to medium complexity problems. For this solution I'll read all the entries from the first file, parse out the IDs, and insert each ID into a hash. Then we will parse each entry in fileB and check if that ID is in the hash we built while walking fileA. In the case of a match, we print the ID and LETTERS joined with a < tab %gt; character.

    With this solution, we look at each line of fileA and fileB exactly once, and we use a hash lookup on IDs which is fast. This reduces our complexity from O(n^2)+ from the previous solution to something closer to O(n log n), possibly close to O(n) if we're lucky with our ID hashing.

    The Code

    #!/usr/bin/perl use warnings; use strict; # open filea and parse all id strings. # Add id strings as keys to %wanted array. my %wanted; { open my $file, '<', "filea" || die "failed to open filea : $!"; while( <$file>) { chomp; @ids = split( /\s+/, $_); $wanted{ $_ }++ for @ids; } close $file; } #read fileb, parse lines of the form "id <whitespace> letters" #and print lines that match the id strings from filea. { open my $file, '<', 'fileb' || die "failed to open fileb : $!"; while (<$file>) { chomp; my ($id, $letters) = split( /\s+/, $_); print "$id\t$letters\n" if $wanted{$id}; } } #OR #read fileb, parse lines of the form "id <newline> letters" #and print lines that match the id strings from filea. { open my $file, '<', 'fileb' || die "failed to open fileb : $!"; while (<$file>) { my $id = $_; my $letters = <$file>; chomp($id); chomp($letters); print "$id\t$letters\n" if $wanted{$id}; } } __END__ FileA: 1DWK 2RFK 4ERH FileB: 1DWK HRSDKKDAHJKLSDLDLLJDGHDFJJE 4ERH DFSKFHADFSBVHFWIHFWJBFS 2RFK DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB FileB (alternate): 1DWK HRSDKKDAHJKLSDLDLLJDGHDFJJE 4ERH DFSKFHADFSBVHFWIHFWJBFS 2RFK DADUHRQWERKBNJAIJDLAJDKAKDNAKDJKSADJKAHDJASHRWEUB
Re: Matching elements in two arrays and printing the element next to the match.
by grizzley (Chaplain) on Mar 22, 2010 at 08:51 UTC
    You must split the row read from first file to move on. In the case there is only one line in this file, you can do such thing:
    @array = split / /, <FILE1>; # reads only one line!
    instead of
    @array = <FILE1>;
Re: Matching elements in two arrays and printing the element next to the match.
by thillai_selvan (Initiate) on Mar 22, 2010 at 08:53 UTC
    Here your FILEA file is having the individual ids on the same line. i.e individual Ids are not delimited by a new line. So when I am printing the $line variable and it is giving the values in the same line as follows. line : 1DWK 2RFK 4ERH. So it will try to match this value in the FILEB and it wont get matched. Because all the Ids are delimited by a new line in FILEB. So you need to use the different delimiter as space character and you need to use the split.