Re: print lines which are not reverse duplicates
by haukex (Archbishop) on Nov 25, 2018 at 14:24 UTC
|
personA,personB,10
personB,personA,190
personA,personC,23
personA,personD,43
personE,personF,10
Code:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV;
use Graph;
my $filename = 'input.csv';
my $g = Graph->new(directed=>1);
open my $fh, '<', $filename or die "$filename: $!";
my $csv = Text::CSV->new({ binary=>1, auto_diag=>2, eol=>$/ });
while ( my $row = $csv->getline($fh) ) {
my ($author1, $author2, $interactions) = @$row;
$g->set_edge_attribute( $author1, $author2, # auto-creates edge
'interactions', $interactions );
}
$csv->eof or $csv->error_diag;
close $fh;
for my $e ($g->edges) {
my ($author1, $author2) = @$e;
next if $g->has_edge($author2, $author1);
my $interactions = $g->get_edge_attribute( $author1, $author2,
'interactions' );
$csv->print(select, [ $author1, $author2, $interactions ]);
}
Output:
personE,personF,10
personA,personD,43
personA,personC,23
Update: The above does not handle the case of duplicates in the input. Is that a concern for you, and if yes, what happens with the "interactions"? Are they supposed to be summed up? | [reply] [d/l] [select] |
|
|
Brilliant, thank you very much for this! Duplicates are not an issue for me this time, an earlier code summed them up and the values formed the interaction columns. Thanks again.
| [reply] |
Re: print lines which are not reverse duplicates
by tybalt89 (Monsignor) on Nov 25, 2018 at 16:40 UTC
|
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1226286
use strict;
use warnings;
my %keep;
while( <DATA> )
{
/(\w+),(\w+),/ or next;
$keep{$2, $1} and delete $keep{$2, $1} or $keep{$1, $2} = $_;
}
print sort values %keep;
__DATA__
personA,personB,10
personA,personC,23
personA,personD,43
personB,personA,190
personE,personF,10
Or, of course, the regex version :)
#!/usr/bin/perl
# https://perlmonks.org/?node_id=1226286
use strict;
use warnings;
local $_ = do { local $/, <DATA> };
1 while s/^(\w+),(\w+),.*\n((?:.*\n)*?)^\2,\1,.*\n/$3/gm;
print;
__DATA__
personA,personB,10
personA,personC,23
personA,personD,43
personB,personA,190
personE,personF,10
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
| [reply] |
Re: print lines which are not reverse duplicates
by hippo (Archbishop) on Nov 25, 2018 at 11:47 UTC
|
It should become pretty clear why the match fails if you add one diagnostic print to display both the search pattern and the string to be searched in.
use strict;
use warnings;
while(<DATA>)
{
my $comment_author1;
my $comment_author2;
my $interactions;
if( $_ =~ /(.*?),(.*?),(.*?)$/gm) {
$comment_author1 = $1;
$comment_author2 = $2;
$interactions = $3;
print "DEBUG: Looking for '$comment_author2,$comment_author1' in $
+_\n";
if( $_ =~ "$comment_author2,$comment_author1") {
print "no match\n";
}
else { print "$comment_author1,$comment_author2,$interactions\n";
}
}
}
__DATA__
personA,personB,10
personB,personA,190
personA,personC,23
personA,personD,43
personE,personF,10
Do you see now where your logic has gone wrong? I think if you had used consistent indentation it might have been clear from the outset:
use strict;
use warnings;
while (<DATA>) {
my $comment_author1;
my $comment_author2;
my $interactions;
if ($_ =~ /(.*?),(.*?),(.*?)$/gm) {
$comment_author1 = $1;
$comment_author2 = $2;
$interactions = $3;
print "DEBUG: Looking for '$comment_author2,$comment_author1'
+in $_\n";
if ($_ =~ "$comment_author2,$comment_author1") {
print "no match\n";
} else {
print "$comment_author1,$comment_author2,$interactions\n";
}
}
}
__DATA__
personA,personB,10
personB,personA,190
personA,personC,23
personA,personD,43
personE,personF,10
| [reply] [d/l] [select] |
|
|
It seems like we interpreted the question differently. :)
While you replied to the "code", I replied to the "data"
When looking at the OP's code ...
$_ =~ "$comment_author2,$comment_author1"
it seems like the OP wants to exclude lines of the form
personA,personA,10 , i.e. 1. and 2. person are equal
But in the description he asked to exclude the first two lines:
personA,personB,10
personB,personA,190
where the relation was reversed.
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
The code in my post is not intended to be a solution, 1nickt, as it is functionally no different from that in the OP. It's an example of debugging/diagnosing the problem and is intended to set Maire on the path. Your question here is precisely what I was hoping that Maire would ask.
| [reply] |
|
|
Ah, I knew that I was doing something really silly, but I couldn't wrap my head around what was going wrong. Thank you for the clarification!
| [reply] |
Re: print lines which are not reverse duplicates
by LanX (Saint) on Nov 25, 2018 at 10:23 UTC
|
Maybe try putting something like this into your loop
unless ($seen{$comment_author2}){
$seen{$comment_author1} = $comment_author2;
} else {
delete $seen{$comment_author2};
}
at the end the hash should only contain unique pairs.
Update
You can put any data into the hash, something like =[$comment_author1,$interactions]
would keep more informations if needed. | [reply] [d/l] [select] |
|
|
| [reply] |
Re: print lines which are not reverse duplicates
by 1nickt (Canon) on Nov 25, 2018 at 11:59 UTC
|
use strict; use warnings; use feature 'say';
my %keep;
for my $line ( <DATA> ) {
chomp $line;
my ( $P1, $P2, $I ) = split /,/, $line;
$keep{"$P1$P2"} = $line;
if ( $keep{"$P2$P1"} ) {
delete $keep{"$P2$P1"};
delete $keep{"$P1$P2"};
}
}
say for values %keep;
__DATA__
personG,personH,42
personA,personB,10
personB,personA,190
personA,personC,23
personA,personD,43
personE,personF,10
Output:
$ perl 1226286.pl
personG,personH,42
personE,personF,10
personA,personD,43
personA,personC,23
... add sorting etc. as you wish ...
Hope this helps!
(*Disclaimer: pre-coffee!)
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
|
| [reply] |