nicol004@uwp.edu has asked for the wisdom of the Perl Monks concerning the following question:

Hello, new user and new to Perl. Thanks ahead for the help.
I have an array that looks like this (but with way more lines of sequence):

>memar0003 aminotransferase, class I and II (2259..3419)
LRDFVSKRARAIPPSGIRKFFDIAQTMEDVISLGVGEPDFVTPWCVCEAS
IYSIEQGSTAYTSNKGTPRLRAAISRYLDTRFSTHYDPEAEIIVTCGVSE
AADIAIRAVTDPGDEILVAEPCYVSYNPCVSLAGGTPVPVLCRAEDEFRL

I have a second array identical except for one of the letter
positions. In one of these arrays the letter may
be there or may be missing and vice versa for the
other array. (I don't know which may be which.
I have put two arrays into a hash to compare the values and
return first value that differs. My goal is to identify
the position (a line of sequence with the
aberration between the two is good enough) in the sequence that differs.
This is what I tried:

$first_data = "first.txt"; open(DAT,$first_data) || die("Could not open file!"); @seq1 = <DAT>; $second_file = "second.txt"; open (RF,$second_file) || die("Could not open file!"); @seq2 = <RF>; my %hash = ( array1=>[@seq1], array2=>[@seq2] ); foreach my $value (%hash) { if($seq1 =~ $seq2){next;} else {print $value; exit;} }

I feel like I'm close but missing something?

Replies are listed 'Best First'.
Re: Beginner Hash Element Comparison
by CountZero (Bishop) on Feb 27, 2011 at 22:39 UTC
    It can be as simple as this:
    use Modern::Perl; use IO::All -strict; use List::MoreUtils qw /each_array/; my @first_array = io('./first.txt')->chomp->slurp; my @second_array= io('./second.txt')->chomp->slurp; my $ea = each_array( @first_array, @second_array ); while ( my ( $first, $second ) = $ea->() ) { if ( $first ne $second ) { say 'Arrays differ at line ', $ea->('index'); say "First: $first"; say "Second: $second"; last; } }
    It is simple and fast but not memory efficient as it reads in the whole files before starting to compare them.

    This one is more memory efficient, but might be slower (depending on the length of the files and where the first difference is found):

    use Modern::Perl; use IO::All; my $first_file = io('./first.txt')->chomp or die $!; my $second_file = io('./second.txt')->chomp or die $!; my $index = 0; while ( my $first_line = $first_file->getline ) { my $second_line = $second_file->getline; if ( $first_line ne $second_line ) { say "Files differ at line $index"; say "First: $first_line"; say "Second: $second_line"; last; } $index++; }

    Update 1: added a more memory efficient version.

    Update 2: used the slurp method in the first version.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Sir, I imagine that you are busy so I'll try to keep this short.
      I spent the better part of yesterday finding understanding and downloading
      those two modules in the second code you posted (Io::All and Modern::Perl)
      After some Nmake problems and all, I was able to run the thing.
      There appears to be a problem with the "io" file opening portion.
      I then wrote a bit of code to more explicitly open the files, and checked to see if it would print their contents
      it does that but when I do it that way i get no output in the command line.
      I am just wondering if you knew what might be going on there?
      I very much like the second code you posted it's very simple.
      I just want to get it working now.
      Thank you for your kind replies.

        "There appears to be a problem with the "io" file opening portion. I then wrote a bit of code to more explicitly open the files, and checked to see if it would print their contents it does that but when I do it that way i get no output in the command line. I am just wondering if you knew what might be going on there?

        You may want to edit your post and include the code you're describing above. Although CountZero helped with a solution he may not be the person who continues to help with your problem. (What if he doesn't check this site for one or two weeks?) If you show the code you wrote after trying his solution another smart member can come along, see both CountZero's work and yours, and be in a much better position to help. :-)


        "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
        There appears to be a problem with the "io" file opening portion.
        What error messages did you get?

        I then wrote a bit of code to more explicitly open the files, and checked to see if it would print their contents. It does that but when I do it that way i get no output in the command line.
        Does it or doesn't it print out anything? Perhaps you can try adding (in the first script) a say "@first_array"; say "@second_array"; just before the loop so you are sure the arrays indeed contain the correct data.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Beginner Hash Element Comparison
by umasuresh (Hermit) on Feb 27, 2011 at 22:06 UTC
Re: Beginner Hash Element Comparison
by JavaFan (Canon) on Feb 28, 2011 at 01:22 UTC
    If all you care is the lines that differ, why not just:
    $ diff first.txt second.txt
Re: Beginner Hash Element Comparison
by JavaFan (Canon) on Feb 27, 2011 at 22:26 UTC
    I have an array that looks like this (but with way more lines of sequence):

    >memar0003 aminotransferase, class I and II (2259..3419)
    LRDFVSKRARAIPPSGIRKFFDIAQTMEDVISLGVGEPDFVTPWCVCEAS
    IYSIEQGSTAYTSNKGTPRLRAAISRYLDTRFSTHYDPEAEIIVTCGVSE
    AADIAIRAVTDPGDEILVAEPCYVSYNPCVSLAGGTPVPVLCRAEDEFRL
    That's not an array.
      How would you indicate to others that there are line breaks
      there at the end of each line?
      When the file is read in the new lines at the end separate out into separate
      elements of the array.
      So I would put the html form for a break in there in a post?
        How would you indicate to others that there are line breaks there at the end of each line?
        What's that got to do with it?

        You posted a blob of text. Blobs of text aren't arrays. Perhaps the blob of text is what the array elements consist of. But as presented, it's a blob of text. It could be just a single array element, each character could be an array element, or it maybe something in between.

        @this_is_an_array = ("look, text", "fragments, each inside", "quotes", + ", and separated by commas");
        For the case of "others" where "others" are Monks in the Monastery, please enclose you data snippets with code tags ... the way you formatted your script.

        For broader exposure -- the .html you mentioned -- you might want to use break tags or pre tags.