Re: Tagging the differencies in both files

Hi, the manual way of doing this is to stuff the files into a hash, and see which hash keys have a count greater than 1. Below is the basic outline of such a script, but you would need to expand the hashes to include which file lines came from and possibly their line numbers, which you can get with

my $line = __LINE__;
[download]

then you would need to rewrite your File1 and File2. Some methods for that are shown in Search Replace String Not Working on text file. Either seek and truncate, or reopen the filehandle with >.

#!/usr/bin/perl
use strict;
use warnings;

open (FILE1, '<', 'File1.txt') or die "Unable to open file1.txt for re
+ading : $!";
open (FILE2, '<', 'File2.txt') or die "Unable to open file2.txt for re
+ading : $!";

my %lines;
while ( <FILE1> ) {
    chomp;
    $lines{$_}++
}
while ( <FILE2> ) {
    chomp;
    $lines{$_}++
}
open (FILE3, '>', 'File3.txt') or die "Unable to open file3.txt for wr
+iting : $!";

for ( keys %lines ) {
    next if $lines{$_} > 1;
    print FILE3 "$_\n";
}
[download]

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

Comment on Re: Tagging the differencies in both files Select or Download Code

Replies are listed 'Best First'.
Re^2: Tagging the differencies in both files by CountZero (Bishop) on Jun 17, 2012 at 16:52 UTC
And how would you handle two files which have the same lines but in a different sequence? Or when the files have multiple identical lines? CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re^3: Tagging the differencies in both files by zentara (Cardinal) on Jun 17, 2012 at 19:42 UTC
Thats why I left it to the OP, :-) , and remarked you would need to expand the hashes to contain all the line data per file. Just as an initial brainstorm, in addition to the hash where you count duplicates, you would have 2 other hashes with each line as key and $filename:linenumber as value. Then in a relatively complex logic loop, you would first find duplicates, then reloop thru each file, testing each line for duplicates, and comparing $filename:linenumbers thru the hash key searches. I'm sure with enough diligence it can be done, because all the information is available in the 3 hashes. Of course, that's just my first thoughts, someone else may know a sweeter way involving less logic. You could also look at tkdiff, it isn't perl tk, but it does color highlighting the way you desire. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re^4: Tagging the differencies in both files by h@kim (Initiate) on Jun 18, 2012 at 05:54 UTC
Thank you for your responses. But the question was answered in another forum. I also asked this question here: http://stackoverflow.com/questions/11070174/tagging-the-differencies-in-both-files and I got a good answer. the code below does the task that I want: `diff --old-line-format "<Diff></Diff>%c'\012'" \ --new-line-format "<Diff>%l</Diff>%c'\012'" \ File1.txt File2.txt > NewFile1.txt diff --old-line-format "<Diff>%l</Diff>%c'\012'" \ --new-line-format "<Diff></Diff>%c'\012'" \ File1.txt File2.txt > NewFile2.txt` [download]	[reply] [d/l]