Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, I have hundreds of pathways that I want to compare. Some of the pathways have overlapping nodes. I am interested in obtaining node pairs in the shortest pathway for each.

pwy nodes A a b c d e f B a b c

The shortest pathway that abc appear in is pwy B, so I want to output abc as pairwise. Similarily, the shortest pathway that def appear in is A.

desired output

de A ef A ab B ac B bc B

I have tried all sorts of crazy moves with hashes, but now I think that maybe I should read all of the nodes into a hash of pathways and then loop through comparing all of the arrays. I know that right now I am just printing out the next door neighbor nodes,but I also think there must be a better way to do this. thanks

example code

my $in=$ARGV[0] || "pathways.col"; open (IN,$in) or die "cannot open $in\n"; my %HoCplx2ID; my %HoPwyPair; while (my $lines=<IN>){ next if ($lines =~/^#/); next if ($lines =~/^UNIQUE-ID/); chomp $lines; my @cols=split(/\t/,$lines); my $cmplxID=$cols[0]; #print $cmplxID."\n"; my $cmplxNm=$cols[1]; my @restCols=@cols[2..$#cols]; my @cycIDs=grep(/^GCXG-/, @restCols); @cycIDs=grep($_ ne '',@cycIDs); print "cycIDs array\n"; print Dumper(@cycIDs); my $pwySize=scalar(@cycIDs); push (@{$HoCplx2ID{$cmplxID}},@cycIDs); for (my $i=0; $i < ($pwySize-1); $i++){ my $pair =join("\t",$cycIDs[$i],$cycIDs[$i+1]); $HoPwyPair{$pair}{$cmplxID}=$pwySize; } } close(IN); ########## print out pairwise with PA01 locusIDs ###### my $org=$ARGV[1]|| "PA01"; my $outfile="$in.$org.pairwise.nxtNeighb.tab"; #open (OUT,">",$outfile); ### step 1 for each pair find smallest pathway my %HoSmPwy; foreach my $pair (keys %HoPwyPair){ $HoSmPwy{$pair}=100; foreach my $pwy (keys %{$HoPwyPair{$pair}}){ if ($HoPwyPair{$pair}{$pwy} < $HoSmPwy{$pair}) { $HoSmPwy{$pair}=$HoPwyPair{$pair}{$pwy}; } } } print "hash of smallest pathways\n"; #print Dumper(%HoSmPwy); ### step 2 for each pathway, look at each pair if that pwy size = smal +lest pathway , then print ## print "output\n"; foreach my $pwy (keys(%HoCplx2ID)){ my @units=@{$HoCplx2ID{$pwy}}; my $pwySize=scalar(@units); for (my $i=0; $i < ($pwySize-1); $i++){ my $pair =join("\t",$units[$i],$units[$i+1]); if ($pwySize = $HoSmPwy{$pair}) { # print $pair."\n"; } } } ####

Replies are listed 'Best First'.
Re: array comparisons
by McA (Priest) on Oct 27, 2014 at 16:06 UTC

    Hi,

    hopefully this helps to get your mind around your problem:

    #!/usr/bin/perl use strict; use warnings; use 5.010; use Data::Dumper; my %neigbours; while (my $line = <DATA>) { chomp $line; my ($id_pathway, @pathnodes) = split ' ', $line; my $path_length = @pathnodes; if($path_length < 2) { warn "Don't know whether paths with less than two nodes are al +lowed. Skipping."; next; } for (my $i = 0; $i < $path_length - 1; $i++) { my $key = "$pathnodes[$i]$pathnodes[$i+1]"; if(defined $neigbours{$key}) { if($neigbours{$key}->{'minpathlen'} >= $path_length) { $neigbours{$key}->{'minpathlen'} = $path_length; $neigbours{$key}->{'id_pathway'} = $id_pathway; } } else { $neigbours{$key}->{'minpathlen'} = $path_length; $neigbours{$key}->{'id_pathway'} = $id_pathway; } } } say Dumper(\%neigbours); __DATA__ A a b c d e f B a b c

    Allow me an annotation: Please use meaningful variable names. Your code is hard to read.

    Regards
    McA