in reply to Re: summarize similar strings
in thread summarize similar strings

Thanks for quick response. This works when i have only one bunch to su +mmarize. However i have many such varied strings. My goal was: => if strings only vary in a "number" like p0/p1 m0/m1 b0/b1 c1_0/c1 +_1 e.t.c they should be summarized. => if they vary in any other character, then its a new string and it + should be left alone (or) summarized with similar ones like for example, if i add one more string: a/b/c/p0/m0/d1/r_a_c1_0/q The o/p should be: a/b/c/p0/m0/b*/r_a_c1_*/q a/b/c/p0/m0/d1/r_a_c1_0/q

Replies are listed 'Best First'.
Re^3: summarize similar strings
by tybalt89 (Monsignor) on Dec 28, 2019 at 08:17 UTC

    Try this:

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11110672 use warnings; use Algorithm::Diff qw(traverse_sequences); my %groups; /\S/ and push @{ $groups{ tr/0-9\n//dr } }, $_ while <DATA>; for ( values %groups ) { my $summary = shift @$_; for ( @$_ ) { my @from = split //, $summary; my @to = split //; $summary = ''; traverse_sequences( \@from, \@to, { MATCH => sub {$summary .= $from[shift()]}, DISCARD_A => sub {$summary .= '*'}, DISCARD_B => sub {$summary .= '*'}, } ); $summary =~ tr/*//s; } print $summary; } __DATA__ a/b/c/p0/m0/b0/r_a_c1_0/q a/b/c/p0/m0/b0/r_a_c1_1/q a/b/c/p0/m0/b0/r_a_c1_2/q some/short/name_2/q a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q some/short/name_7/q a/b/c/p0/m0/d1/r_a_c1_0/q a/b/c/p0/m0/d1/r_a_c1_999/q a/b/c/p0/m0/b1/r_a_c1_0/q a/b/c/p0/m0/b1/r_a_c1_42/q a/b/c/p0/m0/b1/r_a_c1_2/q

    Outputs:

    a/b/c/p0/m0/b*/r_a_c1_*/q some/short/name_*/q a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q a/b/c/p0/m0/d1/r_a_c1_*/q
      Thanks a ton!!! This seems to work like crazy for me. I'll need some time to digest it and fully understand what's going on.
Re^3: summarize similar strings
by tybalt89 (Monsignor) on Dec 28, 2019 at 07:33 UTC

    more comprehensive test case required...