abhay180 has asked for the wisdom of the Perl Monks concerning the following question:

I have many strings in this format:
a/b/c/p0/m0/b0/r_a_c1_0/q a/b/c/p0/m0/b0/r_a_c1_1/q a/b/c/p0/m0/b0/r_a_c1_2/q a/b/c/p0/m0/b1/r_a_c1_0/q a/b/c/p0/m0/b1/r_a_c1_1/q a/b/c/p0/m0/b1/r_a_c1_2/q so on... I want to summarize this pattern as : a/b/c/p0/m0/b*/r_a_c1_*/q
The length of string is not fixed . There could be any number of "/" partitions. What is the efficient way of doing this.

Replies are listed 'Best First'.
Re: summarize similar strings
by tybalt89 (Monsignor) on Dec 28, 2019 at 00:41 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11110672 use warnings; my $summary = <DATA>; while( <DATA> ) { /./ or next; my $xor = "$_" ^ "$summary"; substr $summary, $-[0], 1, '*' while $xor =~ /[^\0]/g; } print $summary; __DATA__ a/b/c/p0/m0/b0/r_a_c1_0/q a/b/c/p0/m0/b0/r_a_c1_1/q a/b/c/p0/m0/b0/r_a_c1_2/q a/b/c/p0/m0/b1/r_a_c1_0/q a/b/c/p0/m0/b1/r_a_c1_1/q a/b/c/p0/m0/b1/r_a_c1_2/q

    Outputs:

    a/b/c/p0/m0/b*/r_a_c1_*/q

    Although you probably need some more extensive test cases :)

      Thanks for quick response. This works when i have only one bunch to su +mmarize. However i have many such varied strings. My goal was: => if strings only vary in a "number" like p0/p1 m0/m1 b0/b1 c1_0/c1 +_1 e.t.c they should be summarized. => if they vary in any other character, then its a new string and it + should be left alone (or) summarized with similar ones like for example, if i add one more string: a/b/c/p0/m0/d1/r_a_c1_0/q The o/p should be: a/b/c/p0/m0/b*/r_a_c1_*/q a/b/c/p0/m0/d1/r_a_c1_0/q

        Try this:

        #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11110672 use warnings; use Algorithm::Diff qw(traverse_sequences); my %groups; /\S/ and push @{ $groups{ tr/0-9\n//dr } }, $_ while <DATA>; for ( values %groups ) { my $summary = shift @$_; for ( @$_ ) { my @from = split //, $summary; my @to = split //; $summary = ''; traverse_sequences( \@from, \@to, { MATCH => sub {$summary .= $from[shift()]}, DISCARD_A => sub {$summary .= '*'}, DISCARD_B => sub {$summary .= '*'}, } ); $summary =~ tr/*//s; } print $summary; } __DATA__ a/b/c/p0/m0/b0/r_a_c1_0/q a/b/c/p0/m0/b0/r_a_c1_1/q a/b/c/p0/m0/b0/r_a_c1_2/q some/short/name_2/q a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q some/short/name_7/q a/b/c/p0/m0/d1/r_a_c1_0/q a/b/c/p0/m0/d1/r_a_c1_999/q a/b/c/p0/m0/b1/r_a_c1_0/q a/b/c/p0/m0/b1/r_a_c1_42/q a/b/c/p0/m0/b1/r_a_c1_2/q

        Outputs:

        a/b/c/p0/m0/b*/r_a_c1_*/q some/short/name_*/q a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q a/b/c/p0/m0/d1/r_a_c1_*/q

        more comprehensive test case required...

Re: summarize similar strings
by harangzsolt33 (Deacon) on Dec 28, 2019 at 07:24 UTC
    I love problems like this. My solution might not be the most efficient one, but this would be my first attempt:

    #!/usr/bin/perl -w use strict; use warnings; my ($Pattern, $c1, $c2) = ''; while( <DATA> ) { /./ or next; chomp; for (my $i = 0; $i < length($_); $i++) { ($c1 = vec($_, $i, 8)) > 32 or next; $c2 = vec($Pattern, $i, 8); $c2 = ($c2 < 33) ? $c1 : $c2; vec($Pattern, $i, 8) = ($c1 == $c2) ? $c1 : 42; } print "\n INPUT = |$_|\nPattern = |$Pattern|\n"; } print "\n\nFINAL RESULT:\n\n $Pattern\n"; __DATA__ /b/c/ a/b/cdx0/m2/b4/r_a_c1_4/w a/b/c/p0/m0/b0/r_a_c1_0/q a/b/c/p0/m0/b0/r_a_c1_1/q a/b/c/p0/m0/b0/r_a_c1_2/q a/b/c/p0/m0/b1/r_a_c1_0/q a/b/c/p0/m0/b1/r_a_c1_1/q a/b/c/p0/m0/b1/r_a_c1_2/q a/b/c/p0/m0/b1/r_a_c1_0/q/g/w/t /q w/b/c/p0/m3/b2/r_a_c d/e0/m2/b
      Thanks a lot. As i explained in later post i needed to summarize only when two strings diff'd in a number like b0/b1 to be summarized as b*; if c0 , then it's a different string. Anyways, the solution posted by tybalt89 seems to be perfect for my problem. Thanks again.
      my ($Pattern, $c1, $c2) = '';
      FYI the above code only initializes $Pattern to an empty string. To init all 3 vars:
      my ($Pattern, $c1, $c2) = ('') x 3;
        Oh, okay. Well, I only wanted to initialize the first variable there. And I was suspecting that that's what my code does. I wasn't sure though. Thank you for confirming that. I never knew I could initialize all three by doing ('') x 3. Thanks! I always learn something new!