yasmarina has asked for the wisdom of the Perl Monks concerning the following question:

Am reading in x2 TAB delimited text files. I'd like to compare the differences of a value from these x2 files then show the entire records of the differences. I have 2 key defined:

sub Read_File { my $FILE = shift ; open (my $fh, "<", $FILE) || die "$!" ; my %hRec ; while (<$fh>) { chomp; my @RecLine=split (/;/) ; my @RecLine=split ('\t') ; $CmpKey=$RecLine[$Key] ; $CmpKey2=$RecLine[$Key2] ; $CmpValue=$RecLine[$Cmp] ; $hRec{$CmpKey,";",$CmpKey2}=$CmpValue; } return (%hRec) ; } foreach my $key (sort {$a<=>$b} keys %hBase) { print "$key -> $hBase{ +$key}\n"; } foreach my $key (sort {$a<=>$b} keys %hPost) { print "$key -> $hPost{ +$key}\n"; }

The above will print the keys & value. But I get this weird white-space '^\'?

E.g.
NR^\;^\INVOICE_TXT -> CHARGE 1^\;^\Access Charge -> .083 1^\;^\Service Charge -> .042 2^\;^\Access Charge -> .083 2^\;^\Service Charge -> 0
How can I remove the '^\' ? Thanks

Replies are listed 'Best First'.
Re: Printing hash key shows obscure values - can this be trimmed/removed?
by hdb (Monsignor) on Apr 28, 2015 at 13:08 UTC

    My guess is that you should replace

    $hRec{$CmpKey,";",$CmpKey2}=$CmpValue;

    with

    $hRec{$CmpKey.";".$CmpKey2}=$CmpValue;

    reason being that if you supply a list as a hash key, Perl uses $; as a seperator which has \034 as default value, see perlvar.

      ... and octal \034 is a control-backslash or "^\" character.


      Give a man a fish:  <%-(-(-(-<

      $hRec{$CmpKey.";".$CmpKey2}=$CmpValue;
      That works. Thank you. Any suggestion to avoid using UNIX system call 'egrep' to extract those records which do not match? i.e:
      my ($recnos, $vINVOICE_TXT) = split (';', $key) ; .... .... .... my $Rec=`egrep ^$recnos"\t"\$recnos.*$vInv $ARGV[0]` ;
      Once the differences have been identified, I split the key into 2 variables. Then attempt to grep the record from the input file to display the full record.

        egrep has a -v flag...

        Sure, Perl's grep built-in function is very powerful, in fact generally much more powerful than Unix' egrep (to tell the truth, they are not used exactly the same way, Perl's grep works on a Perl array, not exactly the same thing as Unix's egrep working usually on files or data flow), in part because Perl's regexes are incredibly good (and far beyond the "official" definition of regular expressions).

        In your case, I am fairly sure that Perl's grep function should do what you want much more efficiently than a system call to egrep, but I am not very clear on what you are really trying to achieve, so I can't help further for the time being. Please specify what you need.

        Je suis Charlie.
Re: Printing hash key shows obscure values - can this be trimmed/removed?
by Corion (Patriarch) on Apr 28, 2015 at 11:08 UTC

    Please help us help you better and show representative input data.

    Also note that one of the two lines here makes no sense, because they both define a variable @RecLine:

    my @RecLine=split (/;/) ; my @RecLine=split ('\t') ;

    Also note that split uses a regular expression to split on, not a string.

    Your code, as shown will never produce any output because you never call Read_File.

      I've commented out
      # my @RecLine=split (/;/) ; my @RecLine=split ('\t') ;
      Here's the rest of the code:
      my $Key=$NR ; my $Cmp=$CHARGE ; my $Key2=$INVOICE_TXT; sub Read_File { my $FILE = shift ; open (my $fh, "<", $FILE) || die "$!" ; my %hRec ; while (<$fh>) { chomp; #my @RecLine=split (/;/) ; my @RecLine=split ('\t') ; $CmpKey=$RecLine[$Key] ; $CmpKey2=$RecLine[$Key2] ; $CmpValue=$RecLine[$Cmp] ; $hRec{$CmpKey,";",$CmpKey2}=$CmpValue; } return (%hRec) ; } my %hBase=Read_File($ARGV[0]) ; my %hPost=Read_File($ARGV[1]) ; my $FILE_DATE=`date "+%Y%M%d_%H%M_%S"` ; chomp ($FILE_DATE) ; my $OUT_FILE="/tmp/CMP_OUT_$FILE_DATE.txt" ; open ( OFILE, "> $OUT_FILE" ) or die "FATAL ERROR! Unable to write fil +e(!)\n" ; ## Debug ## #chomp($key); #foreach my $key (sort {$a<=>$b} keys %hBase) { print "$key -> $hBase +{$key}\n"; } #foreach my $key (sort {$a<=>$b} keys %hPost) { print "$key -> $hPost +{$key}\n"; } foreach my $key (sort {$a<=>$b} keys %hPost) { $count=$count+1 ; if ($hBase{$key} ne $hPost{$key}) { #my $out_counter = "\n##Comparing _$ARGV[0] vs. $ARGV[1]_ differ +ences(!) in record: $count##\n" ; my ($recnos, $vINVOICE_TXT) = split (';', $key) ; chop ($recnos) ; $vINVOICE_TXT=~s!^.!!; my $vInv= "\"$vINVOICE_TXT\"" ; my $Rec=`egrep ^$recnos"\t"\$recnos.*$vInv $ARGV[0]` ; my $out_baseline = "\n$ARGV[0]:\n$HEADER\n$Rec" ; my $Rec=`egrep ^$recnos"\t"\$recnos.*$vInv $ARGV[1]` ; my $out_compare="\n$ARGV[1]:\n$HEADER\n$Rec" ; my $out_result="\n$ARGV[0] [$key -> $hBase{$key}] do not match(! +) with $ARGV[1] [$key -> $hPost{$key}] in record: $count\n" ; print OFILE "$out_result $out_baseline $out_compare " ; print "$out_result $out_baseline $out_compare " ; } } close ( $FILE ) ; close ( OFILE ) ;
      Note that I've added bits to trim off the obscure values:
      my ($recnos, $vINVOICE_TXT) = split (';', $key) ; chop ($recnos) ; $vINVOICE_TXT=~s!^.!!; my $vInv= "\"$vINVOICE_TXT\"" ;
      Am hoping to find a better way... Second to that, I used UNIX 'egrep' to extract the record. Am hoping to find a better way using perl code. Thanks

      Test data (input file):

      1 1 11412 00353xxxxxxxxx 100 1017 00353xxxxxxxxx Voice - Na +t Free Call FREE::VOICE 01-DEC-11 20-NOV-11 600 0 0 + 0 0 0 0 0 0 0,[ ],[ ],[ ]|0,[ ],[ ],[ ]|0,[ ],[ +],[ ] Non-FU 0 National and International Voice Calls
      It's tab delimited. Hence I'm just looking to extract the 1st, 15th and 26th fields. 2 files will be read. 1st & 26th field will be my key. Field 15th is the value I'm comparing against the 2 fields. Any differences will be reported.