print lines which are not reverse duplicates

Maire has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: print lines which are not reverse duplicates by haukex (Archbishop) on Nov 25, 2018 at 14:24 UTC
What you've got sounds like a directed graph, and CPAN has the module Graph. I'm also using Text::CSV to make parsing and output more robust (also install Text::CSV_XS for speed). `input.csv`: `personA,personB,10 personB,personA,190 personA,personC,23 personA,personD,43 personE,personF,10` [download] Code: #!/usr/bin/env perl use warnings; use strict; use Text::CSV; use Graph; my $filename = 'input.csv'; my $g = Graph->new(directed=>1); open my $fh, '<', $filename or die "$filename: $!"; my $csv = Text::CSV->new({ binary=>1, auto_diag=>2, eol=>$/ }); while ( my $row = $csv->getline($fh) ) { my ($author1, $author2, $interactions) = @$row; $g->set_edge_attribute( $author1, $author2, # auto-creates edge 'interactions', $interactions ); } $csv->eof or $csv->error_diag; close $fh; for my $e ($g->edges) { my ($author1, $author2) = @$e; next if $g->has_edge($author2, $author1); my $interactions = $g->get_edge_attribute( $author1, $author2, 'interactions' ); $csv->print(select, [ $author1, $author2, $interactions ]); } [download] Output: `personE,personF,10 personA,personD,43 personA,personC,23` [download] Update: The above does not handle the case of duplicates in the input. Is that a concern for you, and if yes, what happens with the "interactions"? Are they supposed to be summed up?	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by Maire (Scribe) on Nov 26, 2018 at 07:58 UTC
Brilliant, thank you very much for this! Duplicates are not an issue for me this time, an earlier code summed them up and the values formed the interaction columns. Thanks again.	[reply]
Re: print lines which are not reverse duplicates by tybalt89 (Monsignor) on Nov 25, 2018 at 16:40 UTC
`#!/usr/bin/perl # https://perlmonks.org/?node_id=1226286 use strict; use warnings; my %keep; while( <DATA> ) { /(\w+),(\w+),/ or next; $keep{$2, $1} and delete $keep{$2, $1} or $keep{$1, $2} = $_; } print sort values %keep; __DATA__ personA,personB,10 personA,personC,23 personA,personD,43 personB,personA,190 personE,personF,10` [download] Or, of course, the regex version :) `#!/usr/bin/perl # https://perlmonks.org/?node_id=1226286 use strict; use warnings; local $_ = do { local $/, <DATA> }; 1 while s/^(\w+),(\w+),.\n((?:.\n)?)^\2,\1,.\n/$3/gm; print; __DATA__ personA,personB,10 personA,personC,23 personA,personD,43 personB,personA,190 personE,personF,10` [download]	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by hippo (Archbishop) on Nov 26, 2018 at 09:39 UTC
`1 while s/^(\w+),(\w+),.\n((?:.\n)?)^\2,\1,.\n/$3/gm;` Awesome, as usual. Thanks for all your regex examples which I for one invariably marvel at.	[reply] [d/l]
Re^2: print lines which are not reverse duplicates by Maire (Scribe) on Nov 26, 2018 at 07:59 UTC
Thank you for this!	[reply]
Re: print lines which are not reverse duplicates by hippo (Archbishop) on Nov 25, 2018 at 11:47 UTC
It should become pretty clear why the match fails if you add one diagnostic print to display both the search pattern and the string to be searched in. use strict; use warnings; while(<DATA>) { my $comment_author1; my $comment_author2; my $interactions; if( $_ =~ /(.?),(.?),(.?)$/gm) { $comment_author1 = $1; $comment_author2 = $2; $interactions = $3; print "DEBUG: Looking for '$comment_author2,$comment_author1' in $ +_\n"; if( $_ =~ "$comment_author2,$comment_author1") { print "no match\n"; } else { print "$comment_author1,$comment_author2,$interactions\n"; } } } __DATA__ personA,personB,10 personB,personA,190 personA,personC,23 personA,personD,43 personE,personF,10 [download] Do you see now where your logic has gone wrong? I think if you had used consistent indentation it might have been clear from the outset: use strict; use warnings; while (<DATA>) { my $comment_author1; my $comment_author2; my $interactions; if ($_ =~ /(.?),(.?),(.?)$/gm) { $comment_author1 = $1; $comment_author2 = $2; $interactions = $3; print "DEBUG: Looking for '$comment_author2,$comment_author1' +in $_\n"; if ($_ =~ "$comment_author2,$comment_author1") { print "no match\n"; } else { print "$comment_author1,$comment_author2,$interactions\n"; } } } __DATA__ personA,personB,10 personB,personA,190 personA,personC,23 personA,personD,43 personE,personF,10 [download]	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by LanX (Saint) on Nov 25, 2018 at 13:23 UTC
It seems like we interpreted the question differently. :) While you replied to the "code", I replied to the "data" When looking at the OP's code ... `$_ =~ "$comment_author2,$comment_author1"` it seems like the OP wants to exclude lines of the form `personA,personA,10` , i.e. 1. and 2. person are equal But in the description he asked to exclude the first two lines: `personA,personB,10 personB,personA,190` [download] where the relation was reversed. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by 1nickt (Canon) on Nov 25, 2018 at 11:56 UTC
`<puzzled>`That still prints all the lines. How should `$_` also contain the reverse of what it contains?`</puzzled>` The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^3: print lines which are not reverse duplicates by hippo (Archbishop) on Nov 26, 2018 at 09:31 UTC
The code in my post is not intended to be a solution, 1nickt, as it is functionally no different from that in the OP. It's an example of debugging/diagnosing the problem and is intended to set Maire on the path. Your question here is precisely what I was hoping that Maire would ask.	[reply]
Re^2: print lines which are not reverse duplicates by Maire (Scribe) on Nov 26, 2018 at 08:01 UTC
Ah, I knew that I was doing something really silly, but I couldn't wrap my head around what was going wrong. Thank you for the clarification!	[reply]
Re: print lines which are not reverse duplicates by LanX (Saint) on Nov 25, 2018 at 10:23 UTC
Maybe try putting something like this into your loop `unless ($seen{$comment_author2}){ $seen{$comment_author1} = $comment_author2; } else { delete $seen{$comment_author2}; }` [download] at the end the hash should only contain unique pairs. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice} Update You can put any data into the hash, something like `=[$comment_author1,$interactions]` would keep more informations if needed.	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by Maire (Scribe) on Nov 26, 2018 at 08:03 UTC
Thanks!	[reply]
Re: print lines which are not reverse duplicates by 1nickt (Canon) on Nov 25, 2018 at 11:59 UTC
Hi, here's one way: `use strict; use warnings; use feature 'say'; my %keep; for my $line ( <DATA> ) { chomp $line; my ( $P1, $P2, $I ) = split /,/, $line; $keep{"$P1$P2"} = $line; if ( $keep{"$P2$P1"} ) { delete $keep{"$P2$P1"}; delete $keep{"$P1$P2"}; } } say for values %keep; __DATA__ personG,personH,42 personA,personB,10 personB,personA,190 personA,personC,23 personA,personD,43 personE,personF,10` [download] Output: `$ perl 1226286.pl personG,personH,42 personE,personF,10 personA,personD,43 personA,personC,23` [download] ... add sorting etc. as you wish ... Hope this helps! (Disclaimer: pre-coffee!) The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: print lines which are not reverse duplicates by Maire (Scribe) on Nov 26, 2018 at 08:02 UTC
Thank you very much!	[reply]

Update