in reply to Re^2: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
in thread Need to have perl script to compare two txt files and print difference along with under which segment the difference is

Without sight of the files you are comparing it is difficult to provide a solution. You mention segment names so is it safe to assume that each file contains the same segments but the contents of each segment may differ between files. If this is the case a better approach would be to break each file into segments and compare those, e.g. file test1 segment EFGH compared to file test2 segment EFGH rather than comparing the whole files. That way you can keep track of which segments differ.

I hope my guess is close and this is helpful. Please post small example data files so that we can give better advice.

Cheers,

JohnGG

  • Comment on Re^3: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
by User_04271983 (Initiate) on Jan 24, 2019 at 12:08 UTC
    Thanks for reply.. And you guessed it right. Both files will have same segments but the contents of each segment may differ between files. I am posting sample data for reference tst1
    lodv OIDSCRIPT LCLABCG NMESCRIPT FRSJFGHT IT RCNGHSGD CURINR CRDUSWD OPWO7GNOxuXVog ODXCP ODXHC ODXIT APSN EJHFG sdmd DUUPPY MDPSJN PCINKSJ FXDEMAIJSKL1 FXCYYEJ EMCYOAK DLMDWJF IRRNKAJ
    contents of tst2
    lodv OIDSCRIPT LCLABCG NMESCRIPT FRSJFGHT IT RCNGHSGD CURINR CRDUSWD OPWO7GNOxuXVog ODXCP ODXHC ODXIT APSN sdmd DUUPPY MDPSJN PCINKSJ FXDEMAIJSKL1 FXCYYEJ EMCYOAK DLMDWJF IRRNKAJ IJFH LAKJSK
    In tst1 under segment "lodv" we have an extra record at last line "EJHFG" Same way, in tst2 under segment "sdmd" we have extra records "IJFH", "LAKJSK" So can we have difference records along with segments. Hope this gives bit more clarity on my question. Thanks in advance:)

      Try

      #!/usr/bin/perl use strict; use warnings; my @file = ('tst1.txt','tst2.txt'); my %compare = (); # inputs for my $n (0..$#file){ parse_file($n); } # output diff for my $segment (sort keys %compare){ for my $row (sort keys %{$compare{$segment}}){ my $rec = $compare{$segment}{$row}; if (defined $rec->[0] && defined $rec->[1]){ # matched } else { printf "%s %s\n",$segment,$row; } } } sub parse_file { my ($n) = @_; my $filename = $file[$n]; my $segment; open IN,'<', $filename or die "Could not open $filename : $!"; while (<IN>){ s/\s+$//; # trim trailing whitespace if (s/^\s+//){ ++$compare{$segment}{$_}[$n]; } else { $segment = $_; } } close IN; }
      poj
        Thank you so much poj!! It works.. You guys are best.. Thanks to every one who spent time on this..
        Hello poj.. Is it possible to have file name behind the segment name to identify from which file the record is from. output looks like below
        lodv EJHFG sdmd IJFH sdmd LAKJSK
        Thanks much for your efforts on this..
Re^4: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
by User_04271983 (Initiate) on Jan 24, 2019 at 13:27 UTC
    Can someone pls help..
      Please tell us more about your input files.
    • Do they both have EXACTLY the same segments? What do we do if not?
    • Are the segments always in the same order in both files?
    • Do you know the segment names (or even the number of segments) in advance?
    • How many differences do you expect to find in a pair of very large files?
    • Your sample data has very short records. Is this typical?
    • What is the typical (and max) number of records in one segment?
    • How can we always tell a segment name from a data record?
    • Is there anything else you know which might help us?
    • Bill
        Hi Bill,

        Thanks for you reply. As we are in the beginning phase of our work, it wouldn't possible for me to answer few of your questions. But as of now one of the member from forum posted code, which is working for whatever the requirements we have for now. We are expecting around 2000 to 3000 records in a pair of files. Thanks for your time. i will get back here if i need more help from you guys..

      Hello and welcome to the Monastery, User_04271983. Please be patient - many of us have jobs and are spread out in different timezones. At first glance it looks like you've provided a good amount of information, I'm sure someone will get around to looking at it all.