in reply to Re: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
in thread Need to have perl script to compare two txt files and print difference along with under which segment the difference is

Hi Thanks for reply!!
  • Comment on Re^2: Need to have perl script to compare two txt files and print difference along with under which segment the difference is

Replies are listed 'Best First'.
Re^3: Need to have perl script to compare two txt files and print difference along with under which segment the difference is
by johngg (Canon) on Jan 24, 2019 at 11:43 UTC

    Without sight of the files you are comparing it is difficult to provide a solution. You mention segment names so is it safe to assume that each file contains the same segments but the contents of each segment may differ between files. If this is the case a better approach would be to break each file into segments and compare those, e.g. file test1 segment EFGH compared to file test2 segment EFGH rather than comparing the whole files. That way you can keep track of which segments differ.

    I hope my guess is close and this is helpful. Please post small example data files so that we can give better advice.

    Cheers,

    JohnGG

      Thanks for reply.. And you guessed it right. Both files will have same segments but the contents of each segment may differ between files. I am posting sample data for reference tst1
      lodv OIDSCRIPT LCLABCG NMESCRIPT FRSJFGHT IT RCNGHSGD CURINR CRDUSWD OPWO7GNOxuXVog ODXCP ODXHC ODXIT APSN EJHFG sdmd DUUPPY MDPSJN PCINKSJ FXDEMAIJSKL1 FXCYYEJ EMCYOAK DLMDWJF IRRNKAJ
      contents of tst2
      lodv OIDSCRIPT LCLABCG NMESCRIPT FRSJFGHT IT RCNGHSGD CURINR CRDUSWD OPWO7GNOxuXVog ODXCP ODXHC ODXIT APSN sdmd DUUPPY MDPSJN PCINKSJ FXDEMAIJSKL1 FXCYYEJ EMCYOAK DLMDWJF IRRNKAJ IJFH LAKJSK
      In tst1 under segment "lodv" we have an extra record at last line "EJHFG" Same way, in tst2 under segment "sdmd" we have extra records "IJFH", "LAKJSK" So can we have difference records along with segments. Hope this gives bit more clarity on my question. Thanks in advance:)

        Try

        #!/usr/bin/perl use strict; use warnings; my @file = ('tst1.txt','tst2.txt'); my %compare = (); # inputs for my $n (0..$#file){ parse_file($n); } # output diff for my $segment (sort keys %compare){ for my $row (sort keys %{$compare{$segment}}){ my $rec = $compare{$segment}{$row}; if (defined $rec->[0] && defined $rec->[1]){ # matched } else { printf "%s %s\n",$segment,$row; } } } sub parse_file { my ($n) = @_; my $filename = $file[$n]; my $segment; open IN,'<', $filename or die "Could not open $filename : $!"; while (<IN>){ s/\s+$//; # trim trailing whitespace if (s/^\s+//){ ++$compare{$segment}{$_}[$n]; } else { $segment = $_; } } close IN; }
        poj
      Can someone pls help..
        Please tell us more about your input files.
      • Do they both have EXACTLY the same segments? What do we do if not?
      • Are the segments always in the same order in both files?
      • Do you know the segment names (or even the number of segments) in advance?
      • How many differences do you expect to find in a pair of very large files?
      • Your sample data has very short records. Is this typical?
      • What is the typical (and max) number of records in one segment?
      • How can we always tell a segment name from a data record?
      • Is there anything else you know which might help us?
      • Bill

        Hello and welcome to the Monastery, User_04271983. Please be patient - many of us have jobs and are spread out in different timezones. At first glance it looks like you've provided a good amount of information, I'm sure someone will get around to looking at it all.