in reply to Re: Finding Neighbours of a String
in thread Finding Neighbours of a String

Dear Aristotle,

Sorry for coming back to you again. How can I extend/modify your invaluable code above so that it can handle ambiguous string like this:
$str = '[TCG]TTCG[AT]';
The idea is exactly the same as my OP, only this time those characters under brackets [] is also considered as mismatch possibilities.

Please kindly keep, your original answer.

Regards,
Edward

Replies are listed 'Best First'.
Re^3: Finding Neighbours of a String
by Aristotle (Chancellor) on Mar 01, 2006 at 11:57 UTC

    Ah, thanks for your clarification. That’s easy: change the split line to

    my @base = $str =~ /\G ( \[ [^][]+ \] | [^][] ) /xg;

    which will parse the string into units of either a single letter or a bracketed sequence, and change the grep line to

    [ grep { $base[ $i ] !~ $_ } qw( A T C G ) ];

    so that the letter in question will be thrown out if it matches anywhere in a bracketed sequence.

    That’s all, you’re done.

    Makeshifts last the longest.

      Dear Aristotle,

      Sorry slight glitches here. I was working on your last modified code below. It works 99% fine except when the given string is in bracketed format.
      #!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; use Algorithm::Combinatorics qw( combinations ); use Set::CrossProduct; my $str1 = '[TA]TTCGG'; my $e = 2; find_nb($str1,$e); sub find_nb { my ( $str, $d ) = @_; my @base = $str =~ /\G ( \[ [^][]+ \] | [^][] ) /xg; #my @base = split //, $str; for my $exact_distance ( 1 .. $d ) { my $change_idx_iter = combinations( [ 0 .. $#base ], $exact_distance ); while ( my $change_idx = $change_idx_iter->next ) { my @base_combo = map { my $i = $_; [ grep { $base[$i] !~ $_ } qw( A T C G ) ]; #[ grep { $base[$i] ne $_ } qw( A T C G ) ]; } @$change_idx; push @base_combo, [0] if $exact_distance == 1; my $bases_iter = Set::CrossProduct->new( \@base_combo ); my @neighbour = @base; while ( my $new_bases = $bases_iter->get ) { @neighbour[@$change_idx] = @$new_bases; #$_ = "[$_]" for @neighbour[@$change_idx]; my $str = join( "", @neighbour ); print "$str\n"; } } } return; }
      Why my modification above it doesnt' produce this: So the output should be always without bracket. Currently one of the entry appear like this: [TA]TTTTG. Instead this kind of string would need to be represented separately into:
      TTTTTG ATTTTG
      Is there anything I can do to fix it? I really hope to hear from you again. Since your solution is very important to me.

      Here is my brute-force code that generate the result above.

      Regards,
      Edward
Re^3: Finding Neighbours of a String
by Aristotle (Chancellor) on Mar 01, 2006 at 11:35 UTC

    What exactly do you mean by “also considered as mismatch possibilities?”

    Makeshifts last the longest.

      This is one the example:
      # Both strings and candidate are always # the same length $str = '[TCG]TTCG[AT]'; $candidate1 = ' T TTCG G'; # I manually aligned this # The number of mismatch of those string would be: 1 # Namely the only last position gives a mismatch, # The first position it is considered a match.

      Regards,
      Edward