annel has asked for the wisdom of the Perl Monks concerning the following question:

Hello all!

Actually I'm not sure how to defined my question based on specific title, I will try to explain more here then.

Text 1: Entry 1 1 N 2 D EOE
Text 2: Entry 3 6 N 7 EOE Entry 5 4 D 9 D EOE

Text 1 has 1 entry while text 2 has 2 entries. What I wish to do is to compare the contents of entry by entry between files and so on. For example above,both files have first entry then it will proceed to comparison. But, Text 1 does not contain second entry as Text 2 then output a warning message" Text 1 has missing content".

use strict; use warnings; my (@data1,@data2); my (@array,@array2); open my $fh, '<', 'Text1' or die "Could not open file to read:$!"; while (<$fh>) { if (/^\Entry (\w+)/../\EOE/){ push @array, $_; if (/^(\Entry)/){ push @data1 ,$1; } } } open $fh, '<', 'Text2' or die "Could not open file to read:$!"; while (<$fh>) { if (/^\Entry (\w+)/../\EOE/){ push @array2, $_; if (/^(\Entry)/){ push @data2 , $1; } } } print @array2; my $size = @data1 < @data2 ? @data1 : @data2; my $result= @data1 < @data2 ? "File 1 has missing data" : "File 2 has +missing data"; my $ori = @data1 == @data2 ? 1:0; my $i; for( $i = 0; $i < $size; $i++){ if ($data1[$i] ne $data2[$i]){ printf "%s is mismatch with %s\n",$data1[$i],$data2[$i]; } else { &compare(); } } print "$result\n" if (!$ori);

I do not include the subroutine here. I would like to ask if any idea to store content of each entry specifically to ease the comparison work? I have tried array but it stores content of all entries as it has. Instead access to the content of entry 3, it access to all content of text 2. Is hash able to do so? Any suggestion is much appreciated. I am not expecting any codes but some suggestion on which functions or method would really be a big help. I am trying to write Perl myself. Cheers.

Replies are listed 'Best First'.
Re: Methods to store content
by Laurent_R (Canon) on Dec 24, 2013 at 10:30 UTC

    Hi,

    it seems to me you basically need to read file 1 and store the contents into a hash in which the heading ("Entry 1" or simply 1 if that's sufficient) is the key and the content of the entry the value; depending on what exactly you want to do afterwards, the content of the entry may be stored as a scalar string or as an array reference or even a hash reference. Then, you read file 2 and, for each entry, you check if that entry is in the hash (i.e. was in file 1) and, if it exists, you can just proceed with the comparison you need to do between the content of the entry with what was stored in the hash. A simplified version of what you need might look like this:

    # open your file 1 as $IN1... my %hash_file_1; while (<$IN1>) { if (/entry) { my $key = $_; my $value = ""; while (<$IN1>) { last if /EOE/; $value .= $_; } $hash_file_1{$key} = $value; } } # open file 2 as $IN2 while (<$IN2>) { if (/entry) { my $key = $_; if (exists $hash_file_1{$key}) { my $value2 = ""; while (<$IN2>) { last if /EOE/; $value2 .= $_; } # do the comparison between $value2 and $hash_file_1{$ke +y} # and print what you need } } }

    Update: 1. I had not seen Anonymous Monk's last post when I wrote this, I might not have written this if I had seen it, although the approaches are quite different. 2. the inner while loop is duplicated code and could go in a function with something like this.

    The advantage is that if you want to change the data structure from a simple hash to, for example, a hash of arrays, the change needs to be in only one place instead of two, in the sub, which would have to return an array ref instead of a string.

Re: Comparison between data range
by Anonymous Monk on Dec 24, 2013 at 08:14 UTC
      Thanks for your suggestion. Would you be kind enough to share some of your idea about this thread?

        Thanks for your suggestion. Would you be kind enough to share some of your idea about this thread?

        Ok, you asked for some of my ideas

        sound like one of the same type of ... homework ... same type of solutions would apply

        talks about code not working but shows no code ...

        asks questions about code and if something is possible, but of the many questions, none are detailed enough to be answerable ... sure everything is possible ... what was the question?

        Maybe you'd like to focus on one specific question about one specific part of one specific problem?

        Start super small and simple and show your goals, your plan for achieving goals, your code attempt to implement plan, and explain how its not doing what you want to do

        :)

Re: Methods to store content (choosing a data structure for comparing entries from two different files , hash)
by Anonymous Monk on Dec 24, 2013 at 10:19 UTC

    Its easier to focus on each problem if you name each problem :) ... argument passing is how to do subroutines

    use Data::Dump qw/ dd /; ## this? { my( %RedText, %DerText ); $RedText{'Entry 1'}{D}=1; $RedText{'Entry 1'}{N}=2; $DerText{'Entry 3'}{N}=6; $DerText{'Entry 3'}{""}=7; ## uh oh, something fishy $DerText{'Entry 5'}{D}=4; ## uh oh, duplicates overwrite, no good $DerText{'Entry 5'}{D}=9; dd( \%RedText, \%DerText ); } ## maybe this? { my( %RedText, %DerText ); $RedText{'Entry 1'}{1}='D'; $RedText{'Entry 1'}{2}='N'; $DerText{'Entry 3'}{6}='N'; $DerText{'Entry 3'}{7}=''; $DerText{'Entry 5'}{4}='D'; $DerText{'Entry 5'}{9}='D'; dd( \%RedText, \%DerText ); }
    So, if the number values are whats unique for each entry (no duplicates, no repeats), this hash of hashes
    "entry_id", "fist_number", "second_letter" $hash{"entry_id"}{"fist_number"} = "second_letter"; $bothfiles{first_file}{"entry_id"}{"fist_number"} = "second_letter";

    does this make sense to you?