Re^2: Identifying Overlapping Area in a Set of Strings

Hi rnahi,
Thanks so much for your answers. Your solutions provide the correct result but also more.

my $fseq1= 'CCCCGCGC';
my @nsub1= ('CCCCG', 
             'CCCGC', 
               'CGCGC');

#produces

$result1 = [
            [ 0, 'C----' ],
            [ 1, '----C' ],
            [ 1, 'CC---' ],   # but this is extra
            [ 2, '---GC' ]
          ];
[download]

And this

my $fseq2=   'CCGCGCTC';
my @nsub2= ( 'CCGCG',
              'CGCGC',
               'GCGCT',
                'CGCTC' );

#produces

$result2 = [
            [ 0,'C----' ],
            [ 1,'----C' ],
            [ 1,'C----' ], # extra
            [ 2,'----T' ],
            [ 2,'G----' ], # extra
            [ 3,'----C' ],
           ];
[download]

How can I modify your code such that it simply gives:

# So here we can observe that every element of the 
# array is only produced 'once'

# E.g (0,1,2) versus (0,1,1,2) 

$result1 = [
            [ 0, 'C----' ],
            [ 1, '----C' ],
            [ 2, '---GC' ]
          ];

# E.g (0,1,2,3) versus (0,1,1,2,2,3) 
$result2 = [
            [ 0,'C----' ],
            [ 1,'----C' ],
            [ 2,'----T' ],
            [ 3,'----C' ],
           ];
[download]

Regards,
Edward

Comment on Re^2: Identifying Overlapping Area in a Set of Strings Select or Download Code

Replies are listed 'Best First'.
Re^3: Identifying Overlapping Area in a Set of Strings by rnahi (Curate) on Jul 29, 2005 at 17:35 UTC
How can I modify your code such that it simply gives: ... Quite simple: my @results; my %seen; for (1 .. $#nsub) { my $current = $nsub[$_]; my $previous = $nsub[$_ -1]; if ( "$previous#$current" =~ /(\w+)#\1/ ) { my $found = $1; printf "%d -> %s (%s) %d -> %s \n", $_ -1, $previous, $found, $_, $current; $current =~ s/^$found/"-" x length($found)/e; $previous =~ s/$found$/"-" x length($found)/e; push @results, [ $_ -1, $previous] unless $seen{$_ -1}++; push @results, [ $_, $current] unless $seen{$_}++; } else { printf "%d -> no overlap\n", $_ } } print Data::Dumper->Dump([ \@results], ['result']); [download]	[reply] [d/l]
Re^4: Identifying Overlapping Area in a Set of Strings by monkfan (Curate) on Jul 30, 2005 at 04:01 UTC
Thanks so much again rnahi. ~~I hope you don't mind looking at my other instances. I'm really sorry, I didn't mentioned it before because I thought it may appear too complex and too discouraging to read.~~ Suppose I have this: Read more... (669 Bytes) I would like to produce this: Read more... (769 Bytes) Basically 'skipping' the asterisk() but yet still keep its position in array in place. Update:* I've finally succeeded in improving your code such that it can take care those situations. It is not entirely neat and 'super-naive' but it does the job. I think I can't use "grep" function in this case because I still need to keep '*' in its position. My sincere thanks, for providing an excellent starting point to me. Here is the final code: Read more... (1387 Bytes) ~~Please kindly advice. Really hope to hear from you again.~~ Regards, Edward	[reply] [d/l] [select]


We don't bite newbies here... much
	PerlMonks