comment on

I didn't fully understand your suggestion for how you might go about it, but below is the simplistic approach I'd use as a starting point. The key features of the scan are a) to build two data structures simultaneously: one to contain known sets of equivalences, and the second to contain the elements that those sets match; b) when new equivalences are found, the data structures for the equivalent sets are merged.

#!/usr/bin/perl -w
use strict;

my $data = read_input();
my $sets = scan($data);
for (@$sets) {
  printf "{ %s }\n", join ' ', sort { $a <=> $b } @$_;
}

sub read_input {
  my @data;
  local $_;
  while (<DATA>) { 
    push @data, [ grep defined, split /\s+/ ];
  } 
  \@data;
}

sub scan {
  my $data = shift;
  my(%matches, %results);
  for my $index (0 .. $#$data) { 
    my @equal;
    my $these = $data->[$index];
    for my $key (keys %matches) {
      my $compare = $matches{$key};
      if (grep exists $compare->{$_}, @$these) {
        push @equal, $key;
      }
    }

    $results{$index} = [ $index, map @{ delete $results{$_} }, @equal 
+];
    $matches{$index} = { 
      map(($_ => 1), @$these),
      map %{ delete $matches{$_} }, @equal
    };  
  } 
  [ values %results ];
}

__END__
a b c d e
f b g
h i j k l
m f
[download]

If this isn't fast enough, my first thought to improve it would be to find some way of using bit vectors to represent the elements, so that matches can be checked with a bitwise-and of two strings. To do that, you'd need to find a way to translate elements into numbers that you can use as a bit offset.

However, if there are lots of elements most of which appear only once, it may be better to do a prepass to get a list of repeated elements, and then consider only those repeats in the main loop.

Hope this helps,

Hugo

In reply to Re: Building Networks of Matches by hv
in thread Building Networks of Matches by bowsie

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.