in reply to comparing any two text files and writing the difference to a third file

This is not a Perl solution, but I use my programming text editor (TextPad) for comparisons. There are other freeware/shareware solutions. Any good editor designed for writing code will have a compare function and there will be options like: ignore indentation (or not), ignore case (or not), etc. Exact features and how the differences are displayed vary. Using the editor is nice because I've got 3 panes (the 2 files and the dif representation) already open and can cut-n-paste or whatever to get a combined version that I am happy with.

Anyway, just suggesting that you try some free trial versions of a few different program editors. You might find something that you really like without writing any code at all. Mileage varies on "how smart" the comparison is and how the comparison algorithm re-synchronizes after a block of deletions or insertions.

  • Comment on Re: comparing any two text files and writing the difference to a third file

Replies are listed 'Best First'.
Re^2: comparing any two text files and writing the difference to a third file
by afoken (Chancellor) on Jan 23, 2019 at 08:19 UTC

    Unix generally comes with diff preinstalled, if not, it should be easy to install (https://xkcd.com/1654/). If everything else fails, you can install from source.

    On Windows, TortoiseSVN comes with TortoiseMerge, a very nice tool that can not only diff two files, but can also merge two versions of a file. A handy little trick is that you can edit files while using TortoiseMerge in diff mode. TortoiseGit comes with a very similar tool named TortoiseGitMerge. I never used the latter, but I guess it's the same tool except for the interface to the VCS (SVN vs. git).

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: comparing any two text files and writing the difference to a third file
by balanunni (Novice) on Jan 23, 2019 at 05:46 UTC
    Hi, I am familiar with freewares like 'beyond compare'. I have used those. But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program. Perl being a software mostly used to generate reports(that's what I read) and automate, I thought this would be a great start.Coming from LabVIEW where we always had some small examples to demonstrate a particular use case, it is little bit hard to find exact examples for perl. I just can't find a script comparing to texts and printing the difference. See once I understand, I will improvise and learn better.

      Hi,

      You said: "Perl being a software mostly used to generate reports(that's what I read)" ...

      To clarify: the statement was *somewhat* true from about 1987 through about 1994. Beginning with Perl 5.0 the language began to support lexical variables, references, objects and loading non-core modules. Beginning in 1995 the CPAN has hosted thousands of tested, peer-reviewed, third-party extensions to the language that enable programmers to tackle almost any task you can name without having to reinvent wheels. Beginning with Perl 5.004 in 1997 Perl became the dominant language for building web applications, and while other languages have since been developed and are more widely implemented today, Perl is still one of the top programming environments for the some of the most advanced and heavily trafficked web APIs in use and under development today. Meanwhile Perl has established itself as the language of choice in many fields that could be described as forms of "generating reports," bioinformatics perhaps foremost among them. It is also widely used to prototype computation-intensive tasks, eg financial analyses, when the final products are to be deployed using faster implementations. Etcetera ... I cannot complete the list as I have to get back to work using Perl to build interfaces between big commercial APIs that cannot talk to each other ;-)

      Characterizing Perl as being limited to, or even most useful for, any one thing is belied by both the reality on the ground and by Perl's mottoes/nicknames "There Is More Than One Way To Do It," "The Duct Tape of the Internet," and "The Swiss Army Chainsaw" of programming, among others. It's an *old* chestnut, like nearly 25 years out of date!

      Hope this helps!


      The way forward always starts with a minimal test.
        Hi 1nickt, As I said, I am new to the platform. I will learn the wider applications in the coming days, months or years. Anyway, thanks for giving a brief intro. Good to know there are people whom I can rely upon, when I stumble upon any error.

      Ah, yes, reinventing the wheel is a classic learning exercise. Unfortunately, the diff algorithm isn't exactly a simple algorithm to re-invent, especially if you want to handle multi-line changes and the like. When learning a new language, I like to pick an algorithm that I know how to implement in some other language, and then re-implement it in the new language. For your first few attempts, it might not turn out very "perlish", but it gets you learning.

      Once you have something implemented, you could make a post here like "while learning perl, I am trying to convert this algorithm I implemented in somenonperllanguage: <code> .... </code>, and I've successfully re-implemented it in perl here: <code>....</code>. Do you have any suggestions for how to make it more perlish?". Or, if you had problems getting it to work, show us the code you tried, and the expected output vs the output you got. (See also How to ask better questions using Test::More and sample data). Unfortunately, what you provided us was "here's the file-reading code I was able to figure out; now write my diff-algorithm for me", which is less likely to garner detailed answers; even saying "here is the algorithm I'd like to do (....), but I don't know how to implement it in perl" would have likely gotten more help.

      Going back to your original post, commenting on the file access code you've written. First, use warnings; use strict;: this will help enforce things that will make your code better in the long run. open (FH1,$F1)||die "cannot open $F1.\n";: there are four things I would comment on here: 1) generally, modern perl uses or die "..." rather than || die "..." because of precedence issues (which will come up momentarily). 2) it's usually best to use the 3-argument form of open, which is open my $fh1, '<', $F1. (You don't need the parentheses here if you use the OR form of open my $fh1, '<', $F1 or die "...".) 3) You may have noticed I used my $fh1 instead of FH1: this gives it lexical scope (not cluttering the global namespace with FH1 filehandles), and gives the added benefit that when $fh1 drops out of scope, it will automatically close the file for you. 4) If you use autodie; when using modules at the beginning, you don't need the || die / or die construct at all.

        Here is a simplistic algorithm that will just compare each line, one at a time, and show whether the individual lines match or not.

        use warnings; use strict; use autodie; use File::Compare; my $F1="version1.txt"; my $F2="count.txt"; my $F3="differ.txt"; ############################# #### adding these sections to create the files for me { open my $fh, '>', $F1; print {$fh} <<EOT; This is line one This is second line This is third line EOT } # automatically closes file when leaving scope { open my $fh, '>', $F2; print {$fh} <<EOT; This is line 1 This is secont line This is third line EOT } # automatically closes file when leaving scope ############################# my $cmp = compare($F1,$F2); # [pryrt]: if they're big, it's better to +compare once, rather than three times #the addition of the USE line is important for this function to wo +rk. if($cmp==0) {print"they are the same\n";} elsif($cmp==1) {print"they are different\n"; #opening/creating all three open my $fh1, '<', $F1; open my $fh2, '<', $F2; open my $fh3, '>', $F3; while (<$fh1>) { last if eof($fh2); my $content2=<$fh2>; #reads the complete file to content2. + # pryrt: add 'my' for lexical scope chomp($_, $content2); #haven't yet figured out how to pull out the difference data. #### [pryrt]: here's a simple comparison, just highlighting w +hich lines are differnt if($_ eq $content2) { # this line is the same printf $fh3 "%-20s= %s\n", 'MATCH', $_; } else { # this line is different printf $fh3 "%-20s> %s\n", 'DIFFERENT', ''; printf $fh3 "%20s< %s\n", $F1, $_; printf $fh3 "%20s> %s\n", $F2, $content2; } #### [/pryrt] } } elsif($cmp==-1) {print"error\n";} else {print"something wrong\n";} print"Please check the file $F3\n"; # pryrt: don't repeat yours +elf: use the filename variable ### pryrt: I want to display F3 without external help { print "\n===== $F3 =====\n"; open my $fh, '<', $F3; print while(<$fh>); # uses perlish postfix, and perlish default of + print using $_ print "\n===============\n"; }

        with output:

        they are different Please check the file differ.txt ===== differ.txt ===== DIFFERENT > version1.txt< This is line one count.txt> This is line 1 DIFFERENT > version1.txt< This is second line count.txt> This is secont line MATCH = This is third line ===============

        Oh, right, aside from the comments I made, I also added newlines to your print statements.

      But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program.
      I think you've picked a rather challenging problem to solve in the general sense. This is the sort of thing that sounds easy when viewed from 30,000 feet, but from 50 feet, a lot of details emerge. Hey, those things that looked like ants are really people...etc...

      I briefly perused the code presented by Hippo at Re: comparing any two text files and writing the difference to a third file. It appears to be well structured and well commented. A lot could be learned by intense study and experimentation with this code. I would add print statements to try to understand what each part is doing. There are some constructs that I would not expect a beginner to come up with, but study of this code would be instructive.

      When faced with a complex problem, one idea is to simplify and solve a related, but less complex problem. One of the issues with file level diff is re-synchronization after a different block of lines is detected. How about just working with pairs of lines (and words within those lines) to start with?

      1. Bob went to the store.
      2. Bob and Mary went to the store.
      Sentence (2) has "and Mary " as an addition.
      If the sentences are swapped, then sentence (2) has "and Mary " as a deletion.
      Work out some way to represent that.

      What about a "substitution"?.
      This might have to be represented as a deletion and an addition?

      1. Bob went to the store.
      2. Bob went to the movies.
      Or perhaps:
      1. Bob went to the store and then to the bar for happy hour drinks.
      2. Bob and Mary went to the movies and then to the bar.
      Anyway, the ideas is to work on a more constrained problem that illustrates at least some of the difficulties of the more general problem. My examples might not be the best, but I hope my idea is clear.