Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Identifying Overlapping Area in a Set of Strings

by monkfan (Curate)
on Jul 29, 2005 at 14:43 UTC ( [id://479424]=note: print w/replies, xml ) Need Help??


in reply to Re: Identifying Overlapping Area in a Set of Strings
in thread Identifying Overlapping Area in a Set of Strings

Hi rnahi,
Thanks so much for your answers. Your solutions provide the correct result but also more.
my $fseq1= 'CCCCGCGC'; my @nsub1= ('CCCCG', 'CCCGC', 'CGCGC'); #produces $result1 = [ [ 0, 'C----' ], [ 1, '----C' ], [ 1, 'CC---' ], # but this is extra [ 2, '---GC' ] ];
And this
my $fseq2= 'CCGCGCTC'; my @nsub2= ( 'CCGCG', 'CGCGC', 'GCGCT', 'CGCTC' ); #produces $result2 = [ [ 0,'C----' ], [ 1,'----C' ], [ 1,'C----' ], # extra [ 2,'----T' ], [ 2,'G----' ], # extra [ 3,'----C' ], ];


How can I modify your code such that it simply gives:


# So here we can observe that every element of the # array is only produced 'once' # E.g (0,1,2) versus (0,1,1,2) $result1 = [ [ 0, 'C----' ], [ 1, '----C' ], [ 2, '---GC' ] ]; # E.g (0,1,2,3) versus (0,1,1,2,2,3) $result2 = [ [ 0,'C----' ], [ 1,'----C' ], [ 2,'----T' ], [ 3,'----C' ], ];
Regards,
Edward

Replies are listed 'Best First'.
Re^3: Identifying Overlapping Area in a Set of Strings
by rnahi (Curate) on Jul 29, 2005 at 17:35 UTC
    How can I modify your code such that it simply gives: ...

    Quite simple:

    my @results; my %seen; for (1 .. $#nsub) { my $current = $nsub[$_]; my $previous = $nsub[$_ -1]; if ( "$previous#$current" =~ /(\w+)#\1/ ) { my $found = $1; printf "%d -> %s (%s) %d -> %s \n", $_ -1, $previous, $found, $_, $current; $current =~ s/^$found/"-" x length($found)/e; $previous =~ s/$found$/"-" x length($found)/e; push @results, [ $_ -1, $previous] unless $seen{$_ -1}++; push @results, [ $_, $current] unless $seen{$_}++; } else { printf "%d -> no overlap\n", $_ } } print Data::Dumper->Dump([ \@results], ['result']);
      Thanks so much again rnahi.
      I hope you don't mind looking at my other instances. I'm really sorry, I didn't mentioned it before because I thought it may appear too complex and too discouraging to read.
      Suppose I have this:

      I would like to produce this:

      Basically 'skipping' the asterisk(*) but yet still keep its position in array in place.

      Update: I've finally succeeded in improving your code such that it can take care those situations. It is not entirely neat and 'super-naive' but it does the job. I think I can't use "grep" function in this case because I still need to keep '*' in its position.

      My sincere thanks, for providing an excellent starting point to me.
      Here is the final code:

      Please kindly advice. Really hope to hear from you again.
      Regards,
      Edward

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://479424]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-23 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found