in reply to Re: Sub set where all are connected
in thread Sub set where all are connected

This is my test data:

@a_list = ( [1,2], [1,3], [1,4], [1,5], [2,3], [2,4], [3,4], [5,6], [5,7], [5,9], [6,9], [7,8], [8,9], );

... and here are my expected results:

1 2 3 1 2 4 1 3 4 2 3 4 5 6 9

Shamelessly copied from the the pod of Graph::Clique

I am a bit hesitant to pursue Graph::Clique due to the warning on large result sets. I expect cliques of hundreds of members, if not thousands. Some simplifications or "way to define the problem" may reduce computational difficulty many fold while in no way compromising the problem. Let me think about it. If any interest, I will get back - no guarantees on how good the solution will be!

Replies are listed 'Best First'.
Re^3: Sub set where all are connected
by tybalt89 (Monsignor) on Nov 23, 2019 at 17:27 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109069 use warnings; use List::Util qw( uniq ); my $edges = <<END; [1,2], [1,3], [1,4], [1,5], [2,3], [2,4], [3,4], [5,6], [5,7], [5,9], [6,9], [7,8], [8,9], END $edges =~ s/(?<=\[)[\w,]+(?=\])/ join ',', sort split ',', $& /ge; # +fix order #print "$edges\n"; my %alldirect; my %seen; find( uniq sort $edges =~ /\w+/g ); # start with e +very node sub find { $seen{ "@_" }++ and return; if( my @out = "@_:$edges" =~ /\b(\w+)\b.+\b(\w+)\b.*:(?!.*?\[\1,\2\] +)/s ) { for my $node ( @out ) # pair of unconnected nodes, try without + each one { find( grep $_ ne $node, @_ ); } } else { $alldirect{ "@_" }++; # it is fully +connected } } my @seq = sort keys %alldirect; my %subset; # remove subsets of +supersets for my $sub ( @seq ) { for my $super ( @seq ) { if( length $sub < length $super and !$subset{$super} and "$sub\n$super" !~ /\b(\w+)\b.*\n(?!.*\b\1\b)/ ) # sub node +not found { $subset{$sub}++; last; } } } my @directlyconnected = grep !$subset{$_}, @seq; print "$_\n" for @directlyconnected;;

    Outputs :

    1 2 3 4 1 5 5 6 9 5 7 7 8 8 9

    Note: I think your expected output is wrong. 1 2 3 4 are all strongly connected and belong in the same subset.

    Quick explanation:
    Top down approach. Start with set of all nodes.
    Try to find unconnected pair of nodes, if so, try with two subsets, each with one of those nodes.
    If no unconnected pair, have a directly connected subset!
    Second half eliminates valid subsets of larger valid subsets.

      > Note: I think your expected output is wrong. 1 2 3 4 are all strongly connected and belong in the same subset.

      You are right, he c&p'ed the synopsis from Graph::Clique which shows all k-cliques for k=3 only.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      If you want bigger (and visualized) test data you can parse the SVG example inside the Clique_(graph_theory) WP page.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        Looks OK

        1 2 3 4 10 13 9 10 8 9 11 12 13 9 13 14 14 15 14 21 15 16 17 15 17 19 17 18 19 18 19 20 18 19 21 19 21 22 20 23 21 22 23 23 4 4 5 5 6 7 5 7 8
        parse the SVG example

        It just reminded me of an exam at university. It read thus:

        Under your desk there is a red telephone.
        Pick it up and dial 0.
        Shout: "Товарищи, я хочу начать термоядерную войну."
        Shout once more: "Товарищи, я хочу начать термоядерную войну."
        Hang up and rush to the University's bunker.
        Study the environment and atmosphere and report on the next 24, 48 and 72 hours.
        

        Or something like this, :) bw, bliako

Re^3: Sub set where all are connected
by LanX (Saint) on Nov 23, 2019 at 16:00 UTC
    If you are really looking for cliques - i.e. complete sub-graphs - than my approach won't be of great help. (Though not useless in reducing complexity)

    > I expect cliques of hundreds of members, if not thousands.

    Wait ... How many nodes does your graph have?

    😳

    Probably you are better off investigating the complement graph.

    You could also sort all nodes by the number of their neighbours.

    An n-clique is only possible if there are at least n nodes with at least n-1 neighbors.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      The average number of nodes in a "problem set" is around 15. The maximum is around 100,000++. Currently my project has about 37,000 problem sets.

        Try the approach described here

        Re^3: Sub set where all are connected

        for the "average" case.

        The complexity depends on (is related to) the max size of a clique.

        So better be prepared to kill long calculations with a timeout.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re^3: Sub set where all are connected
by LanX (Saint) on Nov 25, 2019 at 17:47 UTC
    > I expect cliques of hundreds of members, if not thousands.

    the default approach seems to be the Bron–Kerbosch algorithm

    From the complexity estimates given there, a worst case of at least 3**(1000/3) = 7.609e+158 for cliques of size 1000 is to be expected. °

    I.o.W. you should already reserve a table at the The Restaurant at the End of the Universe to take a break in between.

    Another approach would be porting Perl to quantum computers and hoping that Google and IBM are producing more than vapor ware in your lifetime.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update Nov 29

    °) for the vertex-ordering version

    from Degeneracy_(graph_theory)#Relation_to_other_graph_parameters

    A k-degenerate graph has chromatic number at most k + 1; this is proved by a simple induction on the number of vertices which is exactly like the proof of the six-color theorem for planar graphs. Since chromatic number is an upper bound on the order of the maximum clique, the latter invariant is also at most degeneracy plus one.