I don't have much time for this now, but check out Tk Multicolor text-search with regex. You should be able to use
the idea. I got the text search code tag tricks from the Tk widget demo.
| [reply] |
Actually close to a much simpler example here, but not quite what I'm looking for:
http://www.perlmonks.org/?node=169190
Here's a quick example of the problem I have at this point. Let's say I want to search for 'gca'. There are 13 occurrences in my unformatted string. When I format the string however, the sequence gets broken up, spread across groups and lines, and I can't find them anymore. Here is the formatted string:
1 atggcgacga aggccgtgtg cgtgctgaag ggcgacggcc cagtgcaggg catcatcaat
61 ttcgagcaga aggaaagtaa tggaccagtg aaggtgtggg gaagcattaa aggactgact
121 gaaggcctgc atggattcca tgttcatgag tttggagata atacagcagg ctgtaccagt
181 gcaggtcctc actttaatcc tctatccaga aaacacggtg ggccaaagga tgaagagagg
241 catgttggag acttgggcaa tgtgactgct gacaaagatg gtgtggccga tgtgtctatt
301 gaagattctg tgatctcact ctcaggagac cattgcatca ttggccgcac actggtggtc
361 catgaaaaag cagatgactt gggcaaaggt ggaaatgaag aaagtacaaa gacaggaaac
421 gctggaagtc gtttggcttg tggtgtaatt gggatcgccc aataaacatt cccttggatg
481 tagtctgagg cccct
The triplet's I would find if I just searched the unformatted string are at (with problem strings in the formatted string bolded):
- Base pair 55 (Row 1, Group 5, 5th letter in)
- Base pair 60 (Row 1, Group 5, 10th letter in)
- Base pair 66 (Row 2, Group 1, 6th letter in)
- Base pair 104 (Row 2, Group 5, 4th letter in)
- Base pair 129 (Row 3, Group 1, 9th letter in)
- Base pair 166 (Row 3, Group 5, 6th letter in)
- Base pair 181 (Row 4, Group 1, 1st letter in)
- Base pair 240 (Row 4, Group 6, 10th letter in) ** BIG PROBLEM
- Base pair 257 (Row 5, Group 2, 7th letter in)
- Base pair 335 (Row 6, Group 4, 5th letter in)
- Base pair 347 (Row 6, Group 5, 7th letter in)
- Base pair 370 (Row 7, Group 1, 10th letter in)
- Base pair 383 (Row 7, Group 3, 3rd letter in)
The problem is that the search can go across group and across lines. A pure regex across the newlines won't work, because I'll end up highlighting the numbers. I suppose that I *could* go back in after tagging, find the numbers, and retag them back, and that might be an option.
I tried setting some code to uppercase, then marking the uppercase letters, and then switching the letters back to lowercase. It cleared the highlight tags when I did that, so that option is out.
I might try just building a widget where the position numbers are outside the Text widget, but it seems like a lot of work.
Here's some test code that's a little easier to swallow than yours if anyone wants to play around with it.
use Tk;
$sequence=" 1 atggcgacga aggccgtgtg cgtgctgaag ggcgacggcc cagtg
+caggg catcatcaat
61 ttcgagcaga aggaaagtaa tggaccagtg aaggtgtggg gaagcattaa aggac
+tgact
121 gaaggcctgc atggattcca tgttcatgag tttggagata atacagcagg ctgta
+ccagt
181 gcaggtcctc actttaatcc tctatccaga aaacacggtg ggccaaagga tgaag
+agagg
241 catgttggag acttgggcaa tgtgactgct gacaaagatg gtgtggccga tgtgt
+ctatt
301 gaagattctg tgatctcact ctcaggagac cattgcatca ttggccgcac actgg
+tggtc
361 catgaaaaag cagatgactt gggcaaaggt ggaaatgaag aaagtacaaa gacag
+gaaac
421 gctggaagtc gtttggcttg tggtgtaatt gggatcgccc aataaacatt ccctt
+ggatg
481 tagtctgagg cccct";
# Main Window
my $mainWindow = MainWindow->new();
$mainWindow->title("Regex Problem Example"); #Sets Title
$mainWindow->minsize(qw(500 500));
$mainWindow->geometry('+500+200');
$mainWindow->optionAdd('*font'=>'Courier 10');
$ROText = $mainWindow->Scrolled('ROText',
-scrollbars=>'osoe');
$ROText->Insert($sequence);
$ROText->pack;
highlightText($ROText, "gca");
MainLoop();
sub highlightText {
my ($widget, $searchString) = @_;
# Create a tag to configure the text
$widget->tagConfigure('foundtag',
-foreground => "white",
-background => "red");
$widget->FindAll(-regex, -nocase, $searchString);
if ($widget->tagRanges('sel'))
{
my %startfinish = $widget->tagRanges('sel');
foreach(sort keys %startfinish)
{
$widget->tagAdd("foundtag", $_, $startfinish{$_});
}
$widget->tagRemove('sel', '1.0', 'end');
}
}
| [reply] [d/l] [select] |
5. Remove my old marks
So if I'm understanding correctly, the problem is that the
FindAndReplaceAll
method does not support backreferences/eval in the replacement expression (like Perl's s/// does) — so something like $text->FindAndReplaceAll(-regexp,-case,'([ACGT])','lc($1)') doesn't work.
Maybe you could instead retrieve the entire contents, do the lowercasing,
and then put the modified content back into the widget:
my $s = $text->Contents();
$s =~ tr/ACGT/acgt/;
$text->Contents($s);
| [reply] [d/l] [select] |
You're not understanding correctly, and that's probably my fault. If I retrieve the entire contents and change it there, I've messed the tags that I put in the string to mark where I want to highlight.
Here's a better set of sample code. I'm demonstrating my first idea if you set $idea=0. If you go into highlightText2 and uncomment where I've said to break, you can see why I'm not using the original idea, but it does appear better here then when I originally wrote the test code on slightly different text.
use Tk;
$idea = 0;
$rawSequence="atggcgacgaaggccgtgtgcgtgctgaagggcgacggcccagtgcagggcatc
+atcaatttcgagcagaaggaaagtaatggaccagtgaaggtgtggggaagcattaaaggactgactgaa
+ggcctgcatggattccatgttcatgagtttggagataatacagcaggctgtaccagtgcaggtcctcac
+tttaatcctctatccagaaaacacggtgggccaaaggatgaagagaggcatgttggagacttgggcaat
+gtgactgctgacaaagatggtgtggccgatgtgtctattgaagattctgtgatctcactctcaggagac
+cattgcatcattggccgcacactggtggtccatgaaaaagcagatgacttgggcaaaggtggaaatgaa
+gaaagtacaaagacaggaaacgctggaagtcgtttggcttgtggtgtaattgggatcgcccaataaaca
+ttcccttggatgtagtctgaggcccct";
my $mainWindow = MainWindow->new();
if( $idea)
{
# Main Window
$mainWindow->title("Regex Problem Example 1");
$mainWindow->minsize(qw(500 500)); #Sets Minimum Size - note lack
+ of comma
$mainWindow->geometry('+500+200'); #Sets Position - note lack of
+comma
$mainWindow->optionAdd('*font'=>'Courier 10');
$ROText = $mainWindow->Scrolled('ROText',
-scrollbars=>'osoe');
$formattedSequence = formatSequence($rawSequence);
$ROText->Insert($formattedSequence);
$ROText->pack;
highlightText($ROText, "gca");
}
else
{
# Main Window
$mainWindow->title("Regex Problem Example 2"); #Sets Title
$mainWindow->minsize(qw(500 500)); #Sets Minimum Size - note lack
+ of comma
$mainWindow->geometry('+500+200'); #Sets Position - note lack of
+comma
$mainWindow->optionAdd('*font'=>'Courier 10');
$ROText = $mainWindow->Scrolled('ROText',
-scrollbars=>'osoe');
$formattedSequence = searchAndFormatSequence($rawSequence);
$ROText->Insert($formattedSequence);
$ROText->pack;
#This works, but I can't make Uppercase->Lowercase
#without removing the tags.
highlightText2($ROText, "[A-Z]+");
}
MainLoop();
sub highlightText {
my ($widget, $searchString) = @_;
#alter search string - HOW?
# Create a tag to configure the text
$widget->tagConfigure('foundtag',
-foreground => "white",
-background => "red");
$widget->FindAll(-regex, -nocase, $searchString);
if ($widget->tagRanges('sel'))
{
my %startfinish = $widget->tagRanges('sel');
foreach(sort keys %startfinish)
{
$widget->tagAdd("foundtag", $_, $startfinish{$_});
}
$widget->tagRemove('sel', '1.0', 'end');
}
}
sub highlightText2 {
my ($widget, $searchString) = @_;
#alter search string - HOW?
# Create a tag to configure the text
$widget->tagConfigure('foundtag',
-foreground => "white",
-background => "red");
$widget->FindAll(-regex, -case, $searchString);
if ($widget->tagRanges('sel'))
{
my %startfinish = $widget->tagRanges('sel');
foreach(sort keys %startfinish)
{
$widget->tagAdd("foundtag", $_, $startfinish{$_});
}
$widget->tagRemove('sel', '1.0', 'end');
}
#UNCOMMENT TO BREAK! Doesn't like a regex in the replace string
#$widget->FindAndReplaceAll(-regex, -case, "G", "g");
#$widget->FindAndReplaceAll(-regex, -case, "C", "c");
#$widget->FindAndReplaceAll(-regex, -case, "A", "a");
}
sub searchAndFormatSequence
{
#I search the sequence and change any gca in the sequence to GCA t
+hen
#format the sequence, the end result of which is
return " 1 atggcgacga aggccgtgtg cgtgctgaag ggcgacggcc cagt
+GCAggG CAtcatcaat
61 ttcgaGCAga aggaaagtaa tggaccagtg aaggtgtggg gaaGCAttaa aggac
+tgact
121 gaaggcctGC Atggattcca tgttcatgag tttggagata atacaGCAgg ctgta
+ccagt
181 GCAggtcctc actttaatcc tctatccaga aaacacggtg ggccaaagga tgaag
+agagG
241 CAtgttggag acttggGCAa tgtgactgct gacaaagatg gtgtggccga tgtgt
+ctatt
301 gaagattctg tgatctcact ctcaggagac cattGCAtca ttggccGCAc actgg
+tggtc
361 catgaaaaaG CAgatgactt ggGCAaaggt ggaaatgaag aaagtacaaa gacag
+gaaac
421 gctggaagtc gtttggcttg tggtgtaatt gggatcgccc aataaacatt ccctt
+ggatg
481 tagtctgagg cccct";
}
sub formatSequence
{
#I do some really neat formatting here the end result of which is.
+..
return " 1 atggcgacga aggccgtgtg cgtgctgaag ggcgacggcc cagt
+gcaggg catcatcaat
61 ttcgagcaga aggaaagtaa tggaccagtg aaggtgtggg gaagcattaa aggac
+tgact
121 gaaggcctgc atggattcca tgttcatgag tttggagata atacagcagg ctgta
+ccagt
181 gcaggtcctc actttaatcc tctatccaga aaacacggtg ggccaaagga tgaag
+agagg
241 catgttggag acttgggcaa tgtgactgct gacaaagatg gtgtggccga tgtgt
+ctatt
301 gaagattctg tgatctcact ctcaggagac cattgcatca ttggccgcac actgg
+tggtc
361 catgaaaaag cagatgactt gggcaaaggt ggaaatgaag aaagtacaaa gacag
+gaaac
421 gctggaagtc gtttggcttg tggtgtaatt gggatcgccc aataaacatt ccctt
+ggatg
481 tagtctgagg cccct";
}
| [reply] [d/l] |
my %startfinish = $widget->tagRanges('sel');
# undo uppercase
my $s = $widget->Contents();
$s =~ tr/A-Z/a-z/;
$widget->Contents($s);
foreach(sort keys %startfinish)
{
$widget->tagAdd("foundtag", $_, $startfinish{$_});
}
That would at least highlight all (lowercased) subsequences. But I'm still not entirely sure what the ultimate objective is... Is it just
the highlighting? | [reply] [d/l] [select] |