in reply to Re: comparing any two text files and writing the difference to a third file
in thread comparing any two text files and writing the difference to a third file

Hi, I am familiar with freewares like 'beyond compare'. I have used those. But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program. Perl being a software mostly used to generate reports(that's what I read) and automate, I thought this would be a great start.Coming from LabVIEW where we always had some small examples to demonstrate a particular use case, it is little bit hard to find exact examples for perl. I just can't find a script comparing to texts and printing the difference. See once I understand, I will improvise and learn better.
  • Comment on Re^2: comparing any two text files and writing the difference to a third file

Replies are listed 'Best First'.
Re^3: comparing any two text files and writing the difference to a third file
by 1nickt (Canon) on Jan 23, 2019 at 14:14 UTC

    Hi,

    You said: "Perl being a software mostly used to generate reports(that's what I read)" ...

    To clarify: the statement was *somewhat* true from about 1987 through about 1994. Beginning with Perl 5.0 the language began to support lexical variables, references, objects and loading non-core modules. Beginning in 1995 the CPAN has hosted thousands of tested, peer-reviewed, third-party extensions to the language that enable programmers to tackle almost any task you can name without having to reinvent wheels. Beginning with Perl 5.004 in 1997 Perl became the dominant language for building web applications, and while other languages have since been developed and are more widely implemented today, Perl is still one of the top programming environments for the some of the most advanced and heavily trafficked web APIs in use and under development today. Meanwhile Perl has established itself as the language of choice in many fields that could be described as forms of "generating reports," bioinformatics perhaps foremost among them. It is also widely used to prototype computation-intensive tasks, eg financial analyses, when the final products are to be deployed using faster implementations. Etcetera ... I cannot complete the list as I have to get back to work using Perl to build interfaces between big commercial APIs that cannot talk to each other ;-)

    Characterizing Perl as being limited to, or even most useful for, any one thing is belied by both the reality on the ground and by Perl's mottoes/nicknames "There Is More Than One Way To Do It," "The Duct Tape of the Internet," and "The Swiss Army Chainsaw" of programming, among others. It's an *old* chestnut, like nearly 25 years out of date!

    Hope this helps!


    The way forward always starts with a minimal test.
      Hi 1nickt, As I said, I am new to the platform. I will learn the wider applications in the coming days, months or years. Anyway, thanks for giving a brief intro. Good to know there are people whom I can rely upon, when I stumble upon any error.
Re^3: comparing any two text files and writing the difference to a third file
by pryrt (Abbot) on Jan 23, 2019 at 15:16 UTC

    Ah, yes, reinventing the wheel is a classic learning exercise. Unfortunately, the diff algorithm isn't exactly a simple algorithm to re-invent, especially if you want to handle multi-line changes and the like. When learning a new language, I like to pick an algorithm that I know how to implement in some other language, and then re-implement it in the new language. For your first few attempts, it might not turn out very "perlish", but it gets you learning.

    Once you have something implemented, you could make a post here like "while learning perl, I am trying to convert this algorithm I implemented in somenonperllanguage: <code> .... </code>, and I've successfully re-implemented it in perl here: <code>....</code>. Do you have any suggestions for how to make it more perlish?". Or, if you had problems getting it to work, show us the code you tried, and the expected output vs the output you got. (See also How to ask better questions using Test::More and sample data). Unfortunately, what you provided us was "here's the file-reading code I was able to figure out; now write my diff-algorithm for me", which is less likely to garner detailed answers; even saying "here is the algorithm I'd like to do (....), but I don't know how to implement it in perl" would have likely gotten more help.

    Going back to your original post, commenting on the file access code you've written. First, use warnings; use strict;: this will help enforce things that will make your code better in the long run. open (FH1,$F1)||die "cannot open $F1.\n";: there are four things I would comment on here: 1) generally, modern perl uses or die "..." rather than || die "..." because of precedence issues (which will come up momentarily). 2) it's usually best to use the 3-argument form of open, which is open my $fh1, '<', $F1. (You don't need the parentheses here if you use the OR form of open my $fh1, '<', $F1 or die "...".) 3) You may have noticed I used my $fh1 instead of FH1: this gives it lexical scope (not cluttering the global namespace with FH1 filehandles), and gives the added benefit that when $fh1 drops out of scope, it will automatically close the file for you. 4) If you use autodie; when using modules at the beginning, you don't need the || die / or die construct at all.

      Here is a simplistic algorithm that will just compare each line, one at a time, and show whether the individual lines match or not.

      use warnings; use strict; use autodie; use File::Compare; my $F1="version1.txt"; my $F2="count.txt"; my $F3="differ.txt"; ############################# #### adding these sections to create the files for me { open my $fh, '>', $F1; print {$fh} <<EOT; This is line one This is second line This is third line EOT } # automatically closes file when leaving scope { open my $fh, '>', $F2; print {$fh} <<EOT; This is line 1 This is secont line This is third line EOT } # automatically closes file when leaving scope ############################# my $cmp = compare($F1,$F2); # [pryrt]: if they're big, it's better to +compare once, rather than three times #the addition of the USE line is important for this function to wo +rk. if($cmp==0) {print"they are the same\n";} elsif($cmp==1) {print"they are different\n"; #opening/creating all three open my $fh1, '<', $F1; open my $fh2, '<', $F2; open my $fh3, '>', $F3; while (<$fh1>) { last if eof($fh2); my $content2=<$fh2>; #reads the complete file to content2. + # pryrt: add 'my' for lexical scope chomp($_, $content2); #haven't yet figured out how to pull out the difference data. #### [pryrt]: here's a simple comparison, just highlighting w +hich lines are differnt if($_ eq $content2) { # this line is the same printf $fh3 "%-20s= %s\n", 'MATCH', $_; } else { # this line is different printf $fh3 "%-20s> %s\n", 'DIFFERENT', ''; printf $fh3 "%20s< %s\n", $F1, $_; printf $fh3 "%20s> %s\n", $F2, $content2; } #### [/pryrt] } } elsif($cmp==-1) {print"error\n";} else {print"something wrong\n";} print"Please check the file $F3\n"; # pryrt: don't repeat yours +elf: use the filename variable ### pryrt: I want to display F3 without external help { print "\n===== $F3 =====\n"; open my $fh, '<', $F3; print while(<$fh>); # uses perlish postfix, and perlish default of + print using $_ print "\n===============\n"; }

      with output:

      they are different Please check the file differ.txt ===== differ.txt ===== DIFFERENT > version1.txt< This is line one count.txt> This is line 1 DIFFERENT > version1.txt< This is second line count.txt> This is secont line MATCH = This is third line ===============

      Oh, right, aside from the comments I made, I also added newlines to your print statements.

Re^3: comparing any two text files and writing the difference to a third file
by Marshall (Canon) on Jan 25, 2019 at 21:31 UTC
    But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program.
    I think you've picked a rather challenging problem to solve in the general sense. This is the sort of thing that sounds easy when viewed from 30,000 feet, but from 50 feet, a lot of details emerge. Hey, those things that looked like ants are really people...etc...

    I briefly perused the code presented by Hippo at Re: comparing any two text files and writing the difference to a third file. It appears to be well structured and well commented. A lot could be learned by intense study and experimentation with this code. I would add print statements to try to understand what each part is doing. There are some constructs that I would not expect a beginner to come up with, but study of this code would be instructive.

    When faced with a complex problem, one idea is to simplify and solve a related, but less complex problem. One of the issues with file level diff is re-synchronization after a different block of lines is detected. How about just working with pairs of lines (and words within those lines) to start with?

    1. Bob went to the store.
    2. Bob and Mary went to the store.
    Sentence (2) has "and Mary " as an addition.
    If the sentences are swapped, then sentence (2) has "and Mary " as a deletion.
    Work out some way to represent that.

    What about a "substitution"?.
    This might have to be represented as a deletion and an addition?

    1. Bob went to the store.
    2. Bob went to the movies.
    Or perhaps:
    1. Bob went to the store and then to the bar for happy hour drinks.
    2. Bob and Mary went to the movies and then to the bar.
    Anyway, the ideas is to work on a more constrained problem that illustrates at least some of the difficulties of the more general problem. My examples might not be the best, but I hope my idea is clear.