jlnelson19 has asked for the wisdom of the Perl Monks concerning the following question:

I have a Text widget ($text) that contains text that I've read in from a file. It looks something like this:

Shapes { Shape=Circle { name=joe diameter=3 color=blue } Shape=Circle { name=steve diameter=5 color=red } Other Shapes... }
I need to find all the circles, grab their names, and put them into an array (to then be read into a tree widget, but that's another topic). Once I've read this info in, the user will need to be able to duplicate any of the circles (they will do this by right-clicking on a circle in the tree). So in addition to all the circle names, I need to store the indices of the first and last characters of each circle's block of text as it appears in the text widget. In the example above I need to end up with an array that looks like this:

circles[0] = joe 3.2 8.3 circles[1] = steve 9.2 14.3

(where the numbers are in line.char format)

Then I can use get() and insert() to duplicate the circles in the text widget.

So in summary, what's the best way to search a text widget and extract information in this way?

Thanks so much for any help!

Replies are listed 'Best First'.
Re: Extracting information from a Text widget
by shigetsu (Hermit) on Jul 03, 2007 at 00:47 UTC

    I have a Text widget ($text) that contains text that I've read in from a file. It looks something like this:
    ...
    I need to find all the circles, grab their names, and put them into an array (to then be read into a tree widget, but that's another topic).

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper qw(Dumper); { my $data = do { local $/; <DATA> }; $data =~ s/^ Shapes \s+? \{ (.*) \} \s+ $/$1/sx; my @chunks; push @chunks, [ $1, $2, $3 ] while $data =~ / Shape=Circle \s+ \{ \s+ name=(\w+) \s+ diameter=(\d+) \s+ color=(\w+) \s+ \} /gsx; print Dumper \@chunks; } __DATA__ Shapes { Shape=Circle { name=joe diameter=3 color=blue } Shape=Circle { name=steve diameter=5 color=red } }

    outputs

    $VAR1 = [ [ 'joe', '3', 'blue' ], [ 'steve', '5', 'red' ] ];

      Thanks shigetsu, this works. I just need to figure out how to also extract the line.char info from where the matches were found in the text window.
      How could this be modified to account for other text? For example:
      Shapes { Shape=Circle { name=joe diameter=3 color=blue { random stuff { further depth } } } }
      I think this just comes down to creating the right regex, but nothing I've tried is able to ignore the rest of the text. I still just want the circle and it's name. thx

        This will parse the format you've indicated. It's pretty flexible about whitespace and accepts comments in most of the obvious places.

        It produces the HoAoHs you see at the bottom of the page which gives you access to everything you need. As requested above, it parses, but ignores nested blocks below the first level.

        It should deal with any number of Shape types and any number of key/value attribute pairs. Both key and value must not contain spaces.

        #! perl -slw use strict; use re qw[ eval ]; use Data::Dump; my %shapes; my( $type, $key, $value ); my $reComment = qr[ \# .*? \n \s* ]x; my $reWS = qr[ \s* | $reComment ]smx; my $reNestedBlock; $reNestedBlock = qr[ \{ (?> [^{}]+ | (??{ $reNestedBlock }) )+ \} $reWS ]smx; my $reAttribs = qr[ $reComment | ( (?> \w+ ) ) (?{ $key = $^N }) \s* = \s* ( \S+ ) (?{ $value = $^N }) (?{ $shapes{ $type }[-1]{ $key } = $value }) $reWS ]smx; my $reShape = qr[ $reComment | Shape \s* = \s* ( (?> \w+ ) ) (?{ push @{ $shapes{ $type = $^N } }, {} }) $reWS+ \{ $reWS (?> $reAttribs+ ) $reNestedBlock? \} $reWS ]smx; my $reShapes = qr[ Shapes $reWS \{ $reWS $reShape+ \} ]smx; do{ local $/; <DATA> } =~ $reShapes or die "Failed to match!"; print dump \%shapes; __DATA__ Shapes { Shape=Circle # A comment { name= joe diameter=3 color=blue # A comment } ## Another one Shape=Triangle{ name=Equilateral sides=3,3,3 color=green } Shape=Triangle{ name = Right sides = 3,4,5 color = green } Shape=Circle { name=steve diameter=5 color=red } Shape=Square { name=steve2 size=5 color=black } Shape = Square { name=fred2 size=3 color=yellow { random stuff { {{{{{{{{{{{{{{{fred}}}}}}}}}}}}}}} } } } Shape=Circle { # A comment name = fred # a comment diameter = 3 color = blue { random stuff { further depth } } } }

        Output:

        C:\test>624564 { Circle => [ { color => "blue", diameter => 3, name => "joe" }, { color => "red", diameter => 5, name => "steve" }, { color => "blue", diameter => 3, name => "fred" }, ], Square => [ { color => "black", name => "steve2", size => 5 }, { color => "yellow", name => "fred2", size => 3 }, ], Triangle => [ { color => "green", name => "Equilateral", sides => "3 +,3,3" }, { color => "green", name => "Right", sides => "3,4,5" +}, ], }

        Two caveats:

        1. When it fails, it doesn't tell you why or where.

          That's why I added the comment facility, it makes it resonably quick to locate errors.

        2. I'm not offering to maintain or extend it for you.

          It's not horribly complicated, and with a little study you should be able to work out how to extend it if you need to.

          I don't mind answering questions to help you, but do not expect to be able to ask for feature XYZ and just get code back. Ask for help in implemeting XYZ, post your attempt, and you'll probably get it.

        Bear those in mind before you decide to use the above code.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extracting information from a Text widget
by aquarium (Curate) on Jul 03, 2007 at 05:17 UTC
    sorry for not giving a direct answer...as i think the design is a kludge. Why would you store line.char references to text, when you could store these shapes in a hash and refer to any attribute directly. i.e. populate the hash as you're reading the Text widget.
    if you really must (why) refer to text positionally, then perl's tk module has good facilities for this, as well as for generating graphics/widgets.
    the hardest line to type correctly is: stty erase ^H
      Well, I need to be able to add shapes into the text widget, say if the user right-clicks on the red circle (in the tree widget), and requests to duplicate the red circle 3 times. Could the hash design accomodate this?

      The reason I was thinking of using the line.char references was so that I could insert the new circles in the correct place in the text (i.e. after the original red circle).

        use a suitable linked list module, or there might be something similar under "sequence" in CPAN. this will help keep the design logic separate from implementation.
        as per your clarification, you're interested in being able to follow sequences. An artificial contrivance such as text box co-ordinates will surely slow you down in implementation, not to mention horrible maintenance. the basic idea is of sequences or linked lists or such. if you use text box co-ordinates, they'll need to be re-computed for shapes following an insert.
        all the best in your project
        the hardest line to type correctly is: stty erase ^H
        ... I was thinking of using the line.char references was so that I could insert the new circles in the correct place in the text ...

        Rather than fighting to translate character offsets to line/char offsets in order to insert the additions, why not query the entire text, perform substitutions on that and then replace (delete/insert) the entire text?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extracting information from a Text widget
by zentara (Cardinal) on Jul 03, 2007 at 13:24 UTC
    Without writing it for you, this script shows the general idea on using the search method of the Text widget.
    #!/usr/bin/perl use warnings; use strict; use Tk; use Tk::LabEntry; my $mw = MainWindow->new( -bg => 'black' ); $mw->geometry('100x30+100+15'); my $file_name = shift || $0; my $file; open (FH,"< $file_name"); read( FH, $file, -s FH ); close FH; my $search_string = ''; my $kind = 'exact'; my $stringframe = $mw->Frame; my $ss = $stringframe->LabEntry( -label => 'Search string:', -width => 40, -labelPack => [qw/-side left -anchor w/], -textvariable => \$search_string )->pack(qw/-side left/); my $ss_button = $stringframe->Button( -text => 'Highlight' ); $ss_button->pack(qw/-side left -pady 5 -padx 10/); my $text = $mw->Scrolled(qw/Text -setgrid true -scrollbars e/); $text->tagConfigure( 'search', -foreground => 'red',-background => ' +white' ); $text->insert('0.0', $file); $text->mark(qw/set insert 0.0/); my $subframe = $mw->Frame; my $exact = $subframe->Radiobutton( -text => 'Exact match', -variable => \$kind, -value => 'exact' ); my $regexp = $subframe->Radiobutton( -text => 'Regular expression', -variable => \$kind, -value => 'regexp' ); $exact->pack( qw/-side left/, -fill => 'x' ); $regexp->pack( qw/-side right/, -fill => 'x' ); $stringframe->pack(qw/-side top -fill x/); $subframe->pack(qw/-side top -fill x/); $text->pack(qw/-expand yes -fill both/); my $command = sub { &search_text($text,\$search_string,'search',$kind) + }; $ss_button->configure( -command => $command ); $ss->bind( '<Return>' => $command ); MainLoop; ###################################################################### +#3 sub search_text { # The utility procedure below searches for all instances of a give +n # string in a text widget and applies a given tag to each instance + found. # Arguments: # # w - The window in which to search. Must be a text widget. # string - Reference to the string to search for. The search i +s done # using exact matching only; no special characters. # tag - Tag to apply to each instance of a matching string. my ( $w, $string, $tag, $kind ) = @_; #print "@_\n"; return unless ref($string) && length($$string); $w->tagRemove( $tag, qw/0.0 end/ ); my ( $current, $length ) = ( '1.0', 0 ); while (1) { $current = $w->search( -count => \$length, "-$kind", $$string, $current +, 'end' ); last if not $current; warn "Posn=$current count=$length\n", $w->tagAdd( $tag, $current, "$current + $length char" ); $current = $w->index("$current + $length char"); } } # end search_text

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum