in reply to Comparing 2 C files
in thread Comparing files

The code posted by etcshadow in the other thread seems to me like it ought to do what you want, or, at least, it ought to do what you say you want, namely, compare two files and get rid of lines in the one that occur in the other.

Read on for an analysis of your code...

#!/user/bin/perl -w #note: %hash1 contains modified functions, #which has been extracted from a C source file(Test1.c). #this hash is used for referencing purposes @array = %hash1;

For testing purposes, did you print out the contents of %hash1 to make sure it contains what you think it does? Also, why exactly are you assigning both the keys and the values from the hash to an array?

#initiate a loop counter $loop = 0; #using for loop to get rid of new line in the array for (@array){ $array[$loop] =~ s/\n//; $loop++ }

I had to read this twice to even understand what it does, even though what it does is very simple. It's not clear at a glance that the element being modified is the current one being iterated over by the loop. Why not just chomp for @array or @array = map {s/\n//; $_} @array or something similarly straightforward? Even better, why not remove the newlines in the first place, when you read them from the file, before you store each one in the hash?

#opening file for writing open (Done, ">sim.c") or die "Can't open sim.c :$!\n";

That part's good.

#initiating a new hash and a string for later use %hash2; $done;

What "initiating" do you think this accomplishes? You don't set them to any value, and you don't scope them, and those are usually the only reasons to initialise a variable.

#a for loop to help popping all the elements in array. for (0..6){ $fish = pop @array; $fish = pop @array;

Okay, so you pop off a value, discard it, then pop off a corresponding key. It now seems very odd, since you are throwing away the values, that your earlier assignment was @array=%hash1. If you'd directly done @array = keys %hash1, you could have saved yourself a pop here, to say nothing of confusion. Even better, why not skip @array altogether, and change your for (0..6) to read for $fish (keys %hash1)? If you remove the newlines when you read the values into the hash like I suggested earlier, you can not only get rid of the whole loop for removing the newlines, but now you can also get rid of @array and make it more clear what you're doing here.

#opening of working file, which is to be compared with #the reference array(%hash1) open (Local, "stub.c") or die "Can't open stub.c :$!\n"; for $local(<Local>){ $local =~ s/\n//; #this is used to compare the 2 variable #if there is no match, assign it as a key to a hash unless ($fish eq $local){ $hash2{$local}++; } close Local; } }

The logic here is a little hard for me to follow, but as I understand it, what you're doing is counting how many of the keys from %hash1 don't exactly match each of the lines in the local file. That is, for each line in stub.c, %hash2 holds a count of how many of the keys from %hash1 are different from this line in stub.c. Is that what you intended? It's certainly not what you described in your question up above.

#print each key of the hash to verify result foreach $done ( keys %hash2){ print "$done\n"; }

For debugging purposes, until you get it working the way you want, you really ought to print the values, as well as the keys:

print join "\n", map {"$_ => $hash2{$_}"} keys %hash2;

$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

Replies are listed 'Best First'.
Re: Re: Comparing 2 C files
by Anonymous Monk on Oct 16, 2003 at 04:52 UTC
    hi,
    I really deserved to be shot and hanged for not identing my code!! A million apologies!

    Thanks for the analysis of my codes. I have learn quite a bit from that(notably the use of chomp to get rid of new line and the assigning of key values into the array, didn't think of that before!).

    However, for the last part, I wish to clarify some things with you. Because what I really want to get the functions that doesn't matches, to be printed out and not the count of the number of functions that doesn't match.

    I did tried what you suggested. And I traced the problem to the following code :

    unless ($fish eq $local) { $hash2{$local}++; }
    because a hash would keep unique cases of whatever that is assigned to it, everytime I tried to change it, this would happen:

    Test1.c
    function1<br> function2<br>
    stub.c
    function1<br> stubfunction1<br> function2<br> stubfunction2<br>
    now we have 2 files to work with and the actual output is:

    function1<br> stubfunction1<br> function2<br> stubfunction2<br><br>
    when actually what I wanted is :

    stubfunction1<br> stubfunction2<br><br>
    So is there any other things that I can try to get the output that I want.
      hi, I really deserved to be shot and hanged for not identing my code!! A million apologies!

      That was other people complaining about that. For me, it doesn't matter, partly because I spend entirely too much time looking at stuff like this, and partly because I'm an auditory thinker (so visual layout has less impact for me than average) and partly because I use Emacs, so if I wanted your code indented a couple of keystrokes would automatically indent it for me. However, you might find that indenting would make it easier for you to keep track of what's going on, especially if you're a visual thinker.

      About chomp: I'm not sure if I was clear. It only removes newlines from the _ends_ of strings. I guessed that in the case of this code that's where they are, because each string is a line that you read from a file. Those are the cases where you usually use chomp. However, if you ever needed to remove newlines from the middle or beginning of a string, you'd want to use the s/\n//g;

      However, for the last part, I wish to clarify some things with you. Because what I really want to get the functions that doesn't matches, to be printed out and not the count of the number of functions that doesn't match.

      Yes, I was guessing that the code didn't do exactly what you really wanted. (That's why you posted here, after all, isn't it?)

      I did tried what you suggested. And I traced the problem to the following code:
      unless ($fish eq $local) { $hash2{$local}++; }

      Right. This is the code that adds one to the count each time the line doesn't match. However, I don't think this is your entire problem...

      because a hash would keep unique cases of whatever that is assigned to it,

      Well, the keys are unique, but you're adding one to the value possibly multiple times.

      Test1.c
      function1<br> function2<br>
      stub.c
      function1<br> stubfunction1<br> function2<br> stubfunction2<br>
      now we have 2 files to work with and the actual output is:
      function1<br> stubfunction1<br> function2<br> stubfunction2<br><br>
      when actually what I wanted is :
      stubfunction1<br> stubfunction2<br><br>
      So is there any other things that I can try to get the output that I want.

      Yes, but you'll need to restructure your approach a little. I believe your problem is your approach to the loop. Here is what you currently have:

      for (0..6) { $fish = pop @array; $fish = pop @array; #opening of working file, which is to be compared with #the reference array(%hash1) open (Local, "stub.c") or die "Can't open stub.c :$!\n"; for $local(<Local>) { $local =~ s/\n//; #this is used to compare the 2 variable #if there is no match, assign it as a key to a hash unless ($fish eq $local){ $hash2{$local}++; } close Local; }

      This loop reads stub.c six times (incidentally, why six?), each time taking a different string from @array and counting the number of lines in stub.c that don't match it. This is not what you want. What you actually want to do is read stub.c only once, checking each line to see whether it matches any of your strings, and print it if it doesn't:

      open STUB, "stub.c" or die "Cannot open stub.c : $!\n"; while (<STUB>) { chomp; # $_ is now a line from stub.c, # and we have to decide whether to print it or not. # If it's a key in %hash1 we don't want to print it; # otherwise, we do: print "$_\n" if not exists $hash1{$_}; }

      $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/