Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Identifying Overlapping Area in a Set of Strings

by rnahi (Curate)
on Jul 29, 2005 at 09:27 UTC ( [id://479316]=note: print w/replies, xml ) Need Help??


in reply to Identifying Overlapping Area in a Set of Strings

Perhaps not the best efficient way, but it does what you ask:

use warnings; use Data::Dumper; my $fseq = 'CCCCGCGC'; my @nsub = ('CCCCG', 'CCCGC', 'CGCGC'); my @results; for (1 .. $#nsub) { my $current = $nsub[$_]; my $previous = $nsub[$_ -1]; # the "#" is used as a separator. # put a different character (or sequence of characters) # if you believe that it could also be in your strings if ( "$previous#$current" =~ /(\w+)#\1/ ) { my $found = $1; printf "%d -> %s (%s) %d -> %s \n", $_ -1, $previous, $found, $_, $current; $current =~ s/^$found/"-" x length($found)/e; $previous =~ s/$found$/"-" x length($found)/e; push @results, [ $_ -1, $previous]; push @results, [ $_, $current]; } else { printf "%d -> no overlap\n", $_ } } print Data::Dumper->Dump([ \@results], ['result']);

Result: (adjusted)

0 -> CCCCG (CCCG) 1 -> CCCGC 1 -> CCCGC (CGC) 2 -> CGCGC $result = [ [ 0, 'C----' ], [ 1, '----C' ], [ 1, 'CC---' ], [ 2, '---GC' ] ];

Replies are listed 'Best First'.
Re^2: Identifying Overlapping Area in a Set of Strings
by monkfan (Curate) on Jul 29, 2005 at 14:43 UTC
    Hi rnahi,
    Thanks so much for your answers. Your solutions provide the correct result but also more.
    my $fseq1= 'CCCCGCGC'; my @nsub1= ('CCCCG', 'CCCGC', 'CGCGC'); #produces $result1 = [ [ 0, 'C----' ], [ 1, '----C' ], [ 1, 'CC---' ], # but this is extra [ 2, '---GC' ] ];
    And this

    How can I modify your code such that it simply gives:
    Regards,
    Edward
      How can I modify your code such that it simply gives: ...

      Quite simple:

      my @results; my %seen; for (1 .. $#nsub) { my $current = $nsub[$_]; my $previous = $nsub[$_ -1]; if ( "$previous#$current" =~ /(\w+)#\1/ ) { my $found = $1; printf "%d -> %s (%s) %d -> %s \n", $_ -1, $previous, $found, $_, $current; $current =~ s/^$found/"-" x length($found)/e; $previous =~ s/$found$/"-" x length($found)/e; push @results, [ $_ -1, $previous] unless $seen{$_ -1}++; push @results, [ $_, $current] unless $seen{$_}++; } else { printf "%d -> no overlap\n", $_ } } print Data::Dumper->Dump([ \@results], ['result']);
        Thanks so much again rnahi.
        I hope you don't mind looking at my other instances. I'm really sorry, I didn't mentioned it before because I thought it may appear too complex and too discouraging to read.
        Suppose I have this:

        I would like to produce this:

        Basically 'skipping' the asterisk(*) but yet still keep its position in array in place.

        Update: I've finally succeeded in improving your code such that it can take care those situations. It is not entirely neat and 'super-naive' but it does the job. I think I can't use "grep" function in this case because I still need to keep '*' in its position.

        My sincere thanks, for providing an excellent starting point to me.
        Here is the final code:

        Please kindly advice. Really hope to hear from you again.
        Regards,
        Edward

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://479316]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-25 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found