Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Determine group membership for partial sequences

by Anno (Deacon)
on Aug 23, 2007 at 22:09 UTC ( [id://634743]=note: print w/replies, xml ) Need Help??


in reply to Determine group membership for partial sequences

It appears that what you are after is a relation among the nodes such that two nodes are related if one overlays the other, directly or indirectly (or if the nodes are identical). Direct overlay is what is stated in the data. The indirect relations can be described as the transitive closure of the relation of immediate overlays. The desired chains would be the equivalence classes of this extended relationship.

Fortunately, there is a module (by Abigail) that implements transitive closure. It is the work-horse of the following solution:

#!/usr/local/bin/perl use strict; use warnings; $| = 1; use Vi::QuickFix; use Algorithm::Graphs::TransitiveClosure qw( floyd_warshall); my %graph; my %nodes; while (<DATA>) { my @next = split; @nodes{ @next} = (); while ( @next > 1 ) { # build refexive and symmetric relation my $top = shift @next; my $below = $next[ 0]; $graph{ $top}->{ $top} = 1; $graph{ $top}->{ $below} = 1; $graph{ $below}->{ $top} = 1; } } # build transitive closure floyd_warshall( \ %graph); # pull off chains my @chains; while ( keys %nodes ) { my $node = each %nodes; my @chain = sort { $a <=> $b } keys %{ $graph{ $node} }; push @chains, \ @chain; delete @nodes{ @chain}; } print "@$_\n" for sort { $a->[ 0] <=> $b->[ 0] } @chains; exit; __DATA__ 1 27 1 32 1 68 ...
The %graph hash is built much like %overlays in your code, except that both ascending and descending pairs are considered related. After the Floyd-Marshall algortithm has built the closure, the groups can be pulled off one by one.

The resulting output

1 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 +30 31 32 33 35 36 37 38 39 40 43 44 45 46 47 48 49 50 52 53 54 56 57 +58 61 65 66 68 72 73 74 75 80 82 84 87 88 89 90 91 92 93 94 95 96 97 +98 99 100 101 102 103 104 106 107 108 109 110 111 112 113 114 116 117 + 119 120 121 122 123 124 125 126 127 129 130 131 132 134 136 137 139 +140 141 145 146 147 148 149 150 151 152 153 154 155 158 159 160 161 1 +62 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 17 +9 180 181 182 183 184 185 186 187 188 189 191 192 193 194 195 196 197 + 198 199 200 201 777 888 999 4 42 51 76 85 86 133 666 34 63 59 71 60 64 67 62 81 83 115 69 78 105 118
shows indeed only one large group which contains the spurious one that appeared in your output.

Update: Come to think of it, what you're doing is determine the connected components of a graph. There are algorithms (and CPAN modules) that implement that directly. Using Floyd-Warshall in this way is possible, but probably less than optimal.

Anno

Replies are listed 'Best First'.
Re^2: Determine group membership for partial sequences
by GrandFather (Saint) on Aug 23, 2007 at 23:13 UTC

    Yay! Exactly the sort of clean up I was looking for. Thank you.

    This is a one off task so the fact that there may be more efficient ways of achieving the result than using Floyd-Warshall doesn't really matter.


    DWIM is Perl's answer to Gödel
      Describing the result as connected components of a graph may be a way to explain the process in intuitive terms.

      Anno

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://634743]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-20 06:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found