in reply to Re: Identifying Overlapping Area in a Set of Strings
in thread Identifying Overlapping Area in a Set of Strings

Hi rnahi,
Thanks so much for your answers. Your solutions provide the correct result but also more.
my $fseq1= 'CCCCGCGC'; my @nsub1= ('CCCCG', 'CCCGC', 'CGCGC'); #produces $result1 = [ [ 0, 'C----' ], [ 1, '----C' ], [ 1, 'CC---' ], # but this is extra [ 2, '---GC' ] ];
And this
my $fseq2= 'CCGCGCTC'; my @nsub2= ( 'CCGCG', 'CGCGC', 'GCGCT', 'CGCTC' ); #produces $result2 = [ [ 0,'C----' ], [ 1,'----C' ], [ 1,'C----' ], # extra [ 2,'----T' ], [ 2,'G----' ], # extra [ 3,'----C' ], ];


How can I modify your code such that it simply gives:


# So here we can observe that every element of the # array is only produced 'once' # E.g (0,1,2) versus (0,1,1,2) $result1 = [ [ 0, 'C----' ], [ 1, '----C' ], [ 2, '---GC' ] ]; # E.g (0,1,2,3) versus (0,1,1,2,2,3) $result2 = [ [ 0,'C----' ], [ 1,'----C' ], [ 2,'----T' ], [ 3,'----C' ], ];
Regards,
Edward

Replies are listed 'Best First'.
Re^3: Identifying Overlapping Area in a Set of Strings
by rnahi (Curate) on Jul 29, 2005 at 17:35 UTC
    How can I modify your code such that it simply gives: ...

    Quite simple:

    my @results; my %seen; for (1 .. $#nsub) { my $current = $nsub[$_]; my $previous = $nsub[$_ -1]; if ( "$previous#$current" =~ /(\w+)#\1/ ) { my $found = $1; printf "%d -> %s (%s) %d -> %s \n", $_ -1, $previous, $found, $_, $current; $current =~ s/^$found/"-" x length($found)/e; $previous =~ s/$found$/"-" x length($found)/e; push @results, [ $_ -1, $previous] unless $seen{$_ -1}++; push @results, [ $_, $current] unless $seen{$_}++; } else { printf "%d -> no overlap\n", $_ } } print Data::Dumper->Dump([ \@results], ['result']);
      Thanks so much again rnahi.
      I hope you don't mind looking at my other instances. I'm really sorry, I didn't mentioned it before because I thought it may appear too complex and too discouraging to read.
      Suppose I have this:

      I would like to produce this:

      Basically 'skipping' the asterisk(*) but yet still keep its position in array in place.

      Update: I've finally succeeded in improving your code such that it can take care those situations. It is not entirely neat and 'super-naive' but it does the job. I think I can't use "grep" function in this case because I still need to keep '*' in its position.

      My sincere thanks, for providing an excellent starting point to me.
      Here is the final code:

      Please kindly advice. Really hope to hear from you again.
      Regards,
      Edward