ignore duplicates and show unique values between 2 text files

perlnoobster has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: ignore duplicates and show unique values between 2 text files by choroba (Cardinal) on Apr 29, 2013 at 15:04 UTC
The reason of your problem is the newline character. It is considered a part of the line you read in, but it is not present on the last line in each file. Therefore, '121' plus newline is not the same as '121' without a newline. Use chomp to get rid of newlines: `while (my $line = <FILE>) { chomp $line; # ...` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: ignore duplicates and show unique values between 2 text files by perlnoobster (Sexton) on Apr 29, 2013 at 15:09 UTC
Hi Choroba, I have updated the script however my output still does not show the desired results, it show: `'def' 'xyx' 'def' 'abc'` [download]	[reply] [d/l]
Re^3: ignore duplicates and show unique values between 2 text files by choroba (Cardinal) on Apr 29, 2013 at 15:23 UTC
You should use chomp for the second file handle, too. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re: ignore duplicates and show unique values between 2 text files by kennethk (Abbot) on Apr 29, 2013 at 15:05 UTC
Your issue appears to be that `"'121'\n"` and `"'121'"` are different strings. If you'd like to be newline insensitive (which would also address the extra newlines in your output), use chomp: use strict; use warnings; my $f2 = 'cat_mapping_in_A.txt'; my $f1 = 'cat_mapping_in_B.txt'; my $outfile = '1.txt'; my %results = (); open FILE1, "$f1" or die "Could not open file: $! \n"; while(my $line = <FILE1>){ chomp $line; $results{$line}=1; } close(FILE1); open FILE2, "$f2" or die "Could not open file: $! \n"; while(my $line =<FILE2>) { chomp $line; $results{$line}++; } close(FILE2); open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \ +n"; foreach my $line (keys %results) { print OUTFILE "$line\n" if $results{$line} == 1; } close OUTFILE; [download] #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: ignore duplicates and show unique values between 2 text files by perlnoobster (Sexton) on Apr 29, 2013 at 15:42 UTC
Hi kennethk, I am unsure on how to "reply to all" But can the script be modified to take account of two columns i.e FILE 1 `261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books'` [download] FILE 2 `261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books'` [download] OUTPUT `261293 'snow > equipment > helmets'` The two columns are separated by Tab, is this possible? Thank you	[reply] [d/l] [select]
Re^3: ignore duplicates and show unique values between 2 text files by kennethk (Abbot) on Apr 29, 2013 at 15:59 UTC
This is Perl; just about everything is "possible". However, I fail to see why the two column example is functionally different than a full line comparison. `"261293\t'snow > equipment > goggles'"` will equal `"261293\t'snow > equipment > goggles'"` just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that? Breaking the two columns apart can easily be achieved with code like `my @terms = split /\t/, $line;`. See split. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^4: ignore duplicates and show unique values between 2 text files by Laurent_R (Canon) on Apr 29, 2013 at 17:35 UTC
Re^3: ignore duplicates and show unique values between 2 text files by LanX (Saint) on Apr 29, 2013 at 15:58 UTC
> is this possible? yes, but we won't post whole code! Apply `my ($number,$article) = split /\s+/, $line, 2` [download] for each input line and decide which part should be unique. learn to do it yourself with split. Cheers Rolf ( addicted to the Perl Programming Language) UPDATE added missing third parameter for split	[reply] [d/l]
Re^4: ignore duplicates and show unique values between 2 text files by kennethk (Abbot) on Apr 29, 2013 at 16:07 UTC
Re^5: ignore duplicates and show unique values between 2 text files by LanX (Saint) on Apr 29, 2013 at 16:10 UTC
Some notes below your chosen depth have not been shown here
Re^2: ignore duplicates and show unique values between 2 text files by perlnoobster (Sexton) on Apr 29, 2013 at 15:18 UTC
Thank you kennethk , it works perfectly	[reply]
Re: ignore duplicates and show unique values between 2 text files by Khen1950fx (Canon) on Apr 29, 2013 at 17:06 UTC
An easier way is to use List::Compare: `#!/usr/bin/perl -l use strict; use warnings; use List::Compare; my (@Llist) = qw(abc def 121 xyz); my (@Rlist) = qw(def 121); my $lc = List::Compare->new( \@Llist, \@Rlist ); my (@sdiff) = $lc->get_symmetric_difference; foreach my $sdiff(@sdiff) { print $sdiff; }` [download]	[reply] [d/l]
Re: ignore duplicates and show unique values between 2 text files (real data) by LanX (Saint) on Apr 29, 2013 at 15:12 UTC
I've tested your code with the data you provided and it works 100%! Maybe you should chomp and trim your data to avoid problems with "invisible" whitespaces? However the output you are showing doesn't fit to the input you posted, since `'121' 'abc'` [download] are never paired in following lines. Please show the real data next time, at least out of courtesy to the people spending time to help you! Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l]
Re^2: ignore duplicates and show unique values between 2 text files by perlnoobster (Sexton) on Apr 29, 2013 at 15:19 UTC
Sorry Rolf, I apologize for the mistake it won't happen again	[reply]
Re: ignore duplicates and show unique values between 2 text files by hdb (Monsignor) on Apr 29, 2013 at 19:06 UTC
If you have TWO files, then you would do "++" on the first, and "--" on the second... use strict; use warnings; my $file1 = <<FILE1; 261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books' FILE1 my $file2 = <<FILE2; 261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books' FILE2 my %uniq; $uniq{$_}++ for split /\n/, $file1; $uniq{$_}-- for split /\n/, $file2; print join "\n", grep { $uniq{$_} } keys %uniq; print "\n"; [download]	[reply] [d/l]
Re^2: ignore duplicates and show unique values between 2 text files by LanX (Saint) on Apr 29, 2013 at 19:14 UTC
> If you have TWO files, then you would do "++" on the first, and "--" on the second... this only works if lines are already unique within their files. i.a.w. `2-1` is true but means the line appeared in both files... Cheers Rolf ( addicted to the Perl Programming Language)	[reply] [d/l]

UPDATE