in reply to Building Networks of Matches

I didn't fully understand your suggestion for how you might go about it, but below is the simplistic approach I'd use as a starting point. The key features of the scan are a) to build two data structures simultaneously: one to contain known sets of equivalences, and the second to contain the elements that those sets match; b) when new equivalences are found, the data structures for the equivalent sets are merged.

#!/usr/bin/perl -w use strict; my $data = read_input(); my $sets = scan($data); for (@$sets) { printf "{ %s }\n", join ' ', sort { $a <=> $b } @$_; } sub read_input { my @data; local $_; while (<DATA>) { push @data, [ grep defined, split /\s+/ ]; } \@data; } sub scan { my $data = shift; my(%matches, %results); for my $index (0 .. $#$data) { my @equal; my $these = $data->[$index]; for my $key (keys %matches) { my $compare = $matches{$key}; if (grep exists $compare->{$_}, @$these) { push @equal, $key; } } $results{$index} = [ $index, map @{ delete $results{$_} }, @equal +]; $matches{$index} = { map(($_ => 1), @$these), map %{ delete $matches{$_} }, @equal }; } [ values %results ]; } __END__ a b c d e f b g h i j k l m f

If this isn't fast enough, my first thought to improve it would be to find some way of using bit vectors to represent the elements, so that matches can be checked with a bitwise-and of two strings. To do that, you'd need to find a way to translate elements into numbers that you can use as a bit offset.

However, if there are lots of elements most of which appear only once, it may be better to do a prepass to get a list of repeated elements, and then consider only those repeats in the main loop.

Hope this helps,

Hugo

Replies are listed 'Best First'.
Re^2: Building Networks of Matches
by bowsie (Initiate) on Dec 23, 2004 at 14:59 UTC
    This is VERY fast and very good - thank you! As for the prepass, I can do that easily with a sort unique in unix. :)

    You're a genius! This may be one of the best Xmas gifts I get this year!

    Thanks!

    Bowsie