Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, What's your advice on saving large text (50+M) from a scrolled text widget to a file? I tried $text->get('1.0','end') which works fine with resonably-sized data. Yet this approach does not work with huge files. I run out of memory rather quickly! Is there a trick to save data line by line from a scrolled text widget rather than as data slurped into a single variable? Any ideas how to get around this? Thank you all.

Replies are listed 'Best First'.
Re: Tk Scrolled Text and Large Files
by graff (Chancellor) on Oct 16, 2005 at 14:07 UTC
    I'm scratching my head wondering how you got 50+M of data into the text widget in the first place, which also leads me to wonder whether part of the problem might be how you are loading data into the widget, as well as how you are getting data out of it for storing to a file. (Any chance you might actually be putting three copies of the data in memory at the same time?) In case that could be part of the problem, you might find some interesting stuff on this thread: Displaying/buffering huge text files. (Updated link to point to top of thread, rather than my own reply to it.)

    Short of that, there's no reason to create a duplicate of 50+M of data all at once. Treat it like a big input file and loop over the content as you move it to the output file:

    open( OUT, ">some_file.txt" ) or die $!; my $startline = 1; my $linespan = 100; while (1) { my $startid = sprintf( "%.1f", $startline ); my $txt = $text_widget->get( $startid, "+ $linespan lines" ); last unless length( $txt ); print OUT $txt; $startline += $linespan; } close OUT;
    Just like in normal file i/o, Tk::Text->get will return an empty string when the requested range falls entirely beyond the end of the current text content.
      I tried hugepad. It looks like a neat utility. Unfortunately it does not work with my kind of data which comes in one single continuous line. I might have to tweak it so that it breaks my text using a different text separator (i.e. local $/ = chr(29)). Here is an example of the data I use:

      Reading the data using the 100 linespan seems like a good approach. I had to tweak this line though to get the script to work properly:
      my $txt = $text_widget->get( $startid, "$startid + $linespan lines" ).

      Thank you guys for leading me in the right direction.

      Edit: g0n - readmore tags round data

      Hmmm! I tried changing the global record seperator but Tk still breaks the lines in its own way within the text widget. Is this hard-coded?
        Considering the rather strange and messy data you seem to be dealing with, I can't image what sort of line breaking would be "appropriate". Tk::Text lets you control how to handle lines that are too wide for the display window: either no wrapping at all (user must scroll the text display left and right to scan all the text), or wrapping at word boundaries (leaving a jagged edge on the side that isn't being "justified"), or wrapping by character (which might be best for you, considering that you have very long stretches with no apparent "word boundary" (i.e. no spaces or punctuation). In any case, "the global record separator" (whatever you are referring to by that) has nothing to do with line breaking in the Text widget.

        BTW, PLEASE PUT <code> and </code> AROUND YOUR DATA SAMPLES AND CODE FROM NOW ON. Using any other sort of "tt" or "pre" style markup in your post will tend to really screw up the node display for all of us trying to read your posts.

Re: Tk Scrolled Text and Large Files
by zentara (Cardinal) on Oct 16, 2005 at 13:03 UTC
    Check out hugepad for an example of how to split a huge file for efficient loading into a Tk text widget.

    I'm not really a human, but I play one on earth. flash japh
Re: Tk Scrolled Text and Large Files
by zentara (Cardinal) on Oct 17, 2005 at 12:22 UTC
    I'm not sure what your problem really is, but from what I can guess, you should split your data before you load it into the text box. The sample data you show, displays Ok, and you could use an approach similar to huge pad to display it.

    Before even trying to display the data, I would split it, and load it into a hash. Then work out a scheme, for the text widget just to display a designated number of hash elements, like huge pad. Then when you want to save the whole file, just loop thru all the hash keys and save(concantate) the values.

    And don't overlook the power that tags can add to your program, here is a simple example.

    #!/usr/bin/perl use warnings; use strict; use Tk; my $mw = tkinit; my $t = $mw->Scrolled('Text', -scrollbars => 'osoe' )->pack; for(1..100){ $t->tagConfigure( 'data'.$_, -data => $_ x 20, ); } for(1..100){ $t->insert('end', 'Line'."$_\n", ['datarider','data'.$_ ]); } $t->tagBind( 'datarider', '<Enter>', sub { getdata($t) } ); $t->tagBind( 'datarider', '<Leave>', sub { getdata($t) } ); $t->bind( '<Motion>', sub{ getdata($t) } ); MainLoop; sub getdata { my ( $text_widget ) = @_; my $x = $text_widget->pointerx - $text_widget->rootx; my $y = $text_widget->pointery - $text_widget->rooty; #print "$x $y\n"; my $txt_index = $text_widget->index( '@' . $x . ',' . $y ); #warn $txt_index; my ( $line, $char ) = ( $txt_index =~ /^(.+?)\.(.+?)$/ ); my @tags = $text_widget->tagNames($txt_index); print "@tags\n"; foreach my $tag(@tags){ print $text_widget->tagCget($tag,'data'),"\n"; } }

    I'm not really a human, but I play one on earth. flash japh
      My apologies my data sample did not display properly even though I used the \<code\>\</code\> tags!

      The program I am working on converts data from different formats. I am not using a back-end database or data structures to capture this data. Essentially it's a scheme where raw data is converted and displayed dynamically in a scrollable text widget. Sometimes, the source data tends to be huge and takes a long time to load or convert.

      I have implemented a limit to display only a certain number of bytes when files are loaded in the text widget. The problem is when the user performs the conversion to save the modified data. I guess I have to change the logic to capture this information in some kind of data structure as the Illustrious Zentara suggested for saving at a later point. I just was not sure if there is a more efficient approach to achieving this in Perl/Tk.

      I really appreciate your help.