Re: comparing any two text files and writing the difference to a third file
by pryrt (Abbot) on Jan 22, 2019 at 14:34 UTC
|
The File::Compare appears to be solely to tell you that there is a difference, not what the differences are. You might want to use something like Text::Diff to extract what the actual differences are between the files. (I haven't used either, but both the functions in the File::Compare say they just stop as soon as a line is different1. For Text::Diff, it implies it returns a string that's similar to the output off the linux diff command, so you could then write that string to your third file.
Aside: if you edit your post (How do I change/delete my post?) to include <code> tags (see also Perl Monks Approved HTML tags) around your program, it will make it easier for future readers of this thread to understand what you were asking, and provide more insight than I have.
1 edit: clarifying, only the compare_text specifies stopping at the first line that is different; but both only return whether the files are the same (0) or different (1), and not what the differences are.
| [reply] [d/l] [select] |
Re: comparing any two text files and writing the difference to a third file
by Your Mother (Archbishop) on Jan 22, 2019 at 16:13 UTC
|
I use Algorithm::Diff’s diff/sdiff for this kind of thing.
I started to clean up some of my code for an example but I'm really not sure what you think the output file should look like. I only do text diffs for HTML display so I can use things like <span class="added">some</span> <span class="changed">test</span>. I don't know what that looks like without markup or if there is a formalized "diff" markup in *nix other than the >|< or patch output stuff and that doesn't seem appropriate here; and if it is appropriate here, perfectly good tools already exist to generate it.
The approach with the Algorithm::Diff routines would be to turn the files into arrays, either line-wise or word or character; whatever granularity you want.
| [reply] [d/l] [select] |
Re: comparing any two text files and writing the difference to a third file
by tybalt89 (Monsignor) on Jan 23, 2019 at 17:08 UTC
|
Here's an example of Algorithm::Diff. Since you didn't show what the output should look like,
I am only printing to STDOUT. Items only in the first file are shown in red, and items only in
the second file are shown in green, where "item" is either a line, a word (non-whitespace), or a character
based on the switch provided.
Note that most of the code is just handling options.
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1228795
use strict; # color diff
use warnings;
use Algorithm::Diff qw(traverse_sequences);
use Term::ANSIColor;
use Getopt::Long;
GetOptions
'lines' => \(my $lines),
'chars' => \(my $chars),
'words' => \(my $words),
'help' => \(my $help),
or die help("bad option");
sub help
{
print <<END;
@_
--lines diff by lines (default)
--chars diff by characters
--words diff by words (non-whitespace)
--help help
END
exit;
}
$help and help("options");
@ARGV == 2 or die "usage: $0 -h -l -w -c oldfilename newfilename\n";
my $regex = $chars ? qr/./s : $words ? qr/\S+|\h+|\n/ : qr/.*/s;
my @from = do { local @ARGV = shift; map /$regex/g, <> };
my @to = do { local @ARGV = shift; map /$regex/g, <> };
traverse_sequences( \@from, \@to,
{
MATCH => sub {print $from[shift()]},
DISCARD_A => sub {print color('red'), $from[shift()], color 'reset'}
+,
DISCARD_B => sub {print color('green'), $to[pop()], color 'reset'},
} );
| [reply] [d/l] |
Re: comparing any two text files and writing the difference to a third file
by Marshall (Canon) on Jan 22, 2019 at 20:52 UTC
|
This is not a Perl solution, but I use my programming text editor (TextPad) for comparisons. There are other freeware/shareware solutions. Any good editor designed for writing code will have a compare function and there will be options like: ignore indentation (or not), ignore case (or not), etc. Exact features and how the differences are displayed vary. Using the editor is nice because I've got 3 panes (the 2 files and the dif representation) already open and can cut-n-paste or whatever to get a combined version that I am happy with.
Anyway, just suggesting that you try some free trial versions of a few different program editors. You might find something that you really like without writing any code at all. Mileage varies on "how smart" the comparison is and how the comparison algorithm re-synchronizes after a block of deletions or insertions. | [reply] |
|
|
Unix generally comes with diff preinstalled, if not, it should be easy to install (https://xkcd.com/1654/). If everything else fails, you can install from source.
On Windows, TortoiseSVN comes with TortoiseMerge, a very nice tool that can not only diff two files, but can also merge two versions of a file. A handy little trick is that you can edit files while using TortoiseMerge in diff mode. TortoiseGit comes with a very similar tool named TortoiseGitMerge. I never used the latter, but I guess it's the same tool except for the interface to the VCS (SVN vs. git).
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] |
|
|
Hi,
I am familiar with freewares like 'beyond compare'. I have used those. But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program. Perl being a software mostly used to generate reports(that's what I read) and automate, I thought this would be a great start.Coming from LabVIEW where we always had some small examples to demonstrate a particular use case, it is little bit hard to find exact examples for perl. I just can't find a script comparing to texts and printing the difference. See once I understand, I will improvise and learn better.
| [reply] |
|
|
Hi,
You said: "Perl being a software mostly used to generate reports(that's what I read)" ...
To clarify: the statement was *somewhat* true from about 1987 through about 1994. Beginning with Perl 5.0 the language began to support lexical variables, references, objects and loading non-core modules. Beginning in 1995 the CPAN has hosted thousands of tested, peer-reviewed, third-party extensions to the language that enable programmers to tackle almost any task you can name without having to reinvent wheels. Beginning with Perl 5.004 in 1997 Perl became the dominant language for building web applications, and while other languages have since been developed and are more widely implemented today, Perl is still one of the top programming environments for the some of the most advanced and heavily trafficked web APIs in use and under development today. Meanwhile Perl has established itself as the language of choice in many fields that could be described as forms of "generating reports," bioinformatics perhaps foremost among them. It is also widely used to prototype computation-intensive tasks, eg financial analyses, when the final products are to be deployed using faster implementations. Etcetera ... I cannot complete the list as I have to get back to work using Perl to build interfaces between big commercial APIs that cannot talk to each other ;-)
Characterizing Perl as being limited to, or even most useful for, any one thing is belied by both the reality on the ground and by Perl's mottoes/nicknames "There Is More Than One Way To Do It," "The Duct Tape of the Internet," and "The Swiss Army Chainsaw" of programming, among others. It's an *old* chestnut, like nearly 25 years out of date! Hope this helps!
The way forward always starts with a minimal test.
| [reply] |
|
|
|
|
Ah, yes, reinventing the wheel is a classic learning exercise. Unfortunately, the diff algorithm isn't exactly a simple algorithm to re-invent, especially if you want to handle multi-line changes and the like. When learning a new language, I like to pick an algorithm that I know how to implement in some other language, and then re-implement it in the new language. For your first few attempts, it might not turn out very "perlish", but it gets you learning.
Once you have something implemented, you could make a post here like "while learning perl, I am trying to convert this algorithm I implemented in somenonperllanguage: <code> .... </code>, and I've successfully re-implemented it in perl here: <code>....</code>. Do you have any suggestions for how to make it more perlish?". Or, if you had problems getting it to work, show us the code you tried, and the expected output vs the output you got. (See also How to ask better questions using Test::More and sample data). Unfortunately, what you provided us was "here's the file-reading code I was able to figure out; now write my diff-algorithm for me", which is less likely to garner detailed answers; even saying "here is the algorithm I'd like to do (....), but I don't know how to implement it in perl" would have likely gotten more help.
Going back to your original post, commenting on the file access code you've written. First, use warnings; use strict;: this will help enforce things that will make your code better in the long run. open (FH1,$F1)||die "cannot open $F1.\n";: there are four things I would comment on here: 1) generally, modern perl uses or die "..." rather than || die "..." because of precedence issues (which will come up momentarily). 2) it's usually best to use the 3-argument form of open, which is open my $fh1, '<', $F1. (You don't need the parentheses here if you use the OR form of open my $fh1, '<', $F1 or die "...".) 3) You may have noticed I used my $fh1 instead of FH1: this gives it lexical scope (not cluttering the global namespace with FH1 filehandles), and gives the added benefit that when $fh1 drops out of scope, it will automatically close the file for you. 4) If you use autodie; when using modules at the beginning, you don't need the || die / or die construct at all.
| [reply] [d/l] [select] |
|
|
|
|
But currently I am learning perl, a total beginner. I did my basic lessons and was trying to give myself a challenge with this program.
I think you've picked a rather challenging problem to solve in the general sense. This is the sort of thing that sounds easy when viewed from 30,000 feet, but from 50 feet, a lot of details emerge. Hey, those things that looked like ants are really people...etc...
I briefly perused the code presented by Hippo at Re: comparing any two text files and writing the difference to a third file. It appears to be well structured and well commented. A lot could be learned by intense study and experimentation with this code. I would add print statements to try to understand what each part is doing. There are some constructs that I would not expect a beginner to come up with, but study of this code would be instructive.
When faced with a complex problem, one idea is to simplify and solve a related, but less complex problem. One of the issues with file level diff is re-synchronization after a different block of lines is detected. How about just working with pairs of lines (and words within those lines) to start with?
- Bob went to the store.
- Bob and Mary went to the store.
Sentence (2) has "and Mary " as an addition.
If the sentences are swapped, then sentence (2) has "and Mary " as a deletion.
Work out some way to represent that.
What about a "substitution"?.
This might have to be represented as a deletion and an addition?
- Bob went to the store.
- Bob went to the movies.
Or perhaps:
- Bob went to the store and then to the bar for happy hour drinks.
- Bob and Mary went to the movies and then to the bar.
Anyway, the ideas is to work on a more constrained problem that illustrates at least some of the difficulties of the more general problem. My examples might not be the best, but I hope my idea is clear.
| [reply] |
Re: comparing any two text files and writing the difference to a third file
by hippo (Archbishop) on Jan 23, 2019 at 23:36 UTC
|
| [reply] |