solri1 has asked for the wisdom of the Perl Monks concerning the following question:
We specify which section tags to show and which to delete. So, if I called this function on the above text with only section 1 set to show, I'd get:This is text 0. <SECTION[1]> This is text 1. </SECTION> This is text 0. <SECTION[2]> This is text 2. </SECTION>
Here's my implementation/test program:This is text 0. This is text 1. This is text 0.
The above code takes over 2 seconds to run on my system, and it gets even worse as the page it's processing gets larger. How can I improve the speed of this? I'm using Text::Balanced v2.0.0, which fixes the issue of a $& variable slowing down all regexps.#!/usr/bin/perl -d:DProf use Carp; use Text::Balanced; our $extract_section_tag = Text::Balanced::gen_extract_tagged( '<SEC +TION\[([^\]]+)\]>\n?', '</SECTION>\n?', '[\S\s]*?(?=<SECTION\[)' ); my $page; $page .= "This is some filler.\n" x 20000; $page .= "<SECTION[test1]>\n"; $page .= "This is some filler.\n" x 20000; $page .= "</SECTION>\n"; $page .= "This is some filler.\n" x 20000; $page .= "<SECTION[test2]>\n"; $page .= "This is some filler.\n" x 20000; $page .= "</SECTION>\n"; $page .= "This is some filler.\n" x 20000; print "Calling process_section. Length of page=[" . length($page) ."] +\n"; my $newpage = process_section($page, {test1 => 1}); print "Done. Length of newpage=[" . length($newpage) ."]\n"; sub process_section { my ($page, $hashref) = @_; my ($tag_section, $post, $pre, $tag_open, $content, $tag_close, @i +nfo); my $return = ''; while ( @info = $extract_section_tag->($page) ) { ($tag_section, $post, $pre, $tag_open, $content, $tag_close) = + @info; if (! (defined $tag_section && length $tag_section) ) { if ($post =~ m/<SECTION\[/) { my $excerpt = substr($post, 0, 100); print STDERR "\n"; Carp::carp("Warning: Unbalanced SECTION tags. Fix the + template! Error near: $excerpt\n"); } last; } my $show = 0; if ($tag_open =~ m/<SECTION\[(.*?)\]>/) { $show = 1 if (exists $hashref->{$1}); } if ($show && $content =~ m/<SECTION/) { $content = process_section($content, $hashref); } $return .= $pre . ($show ? $content : ''); $page = $post; } $return .= $post; return $return; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Am I using Text::Balanced correctly? Speed issues.
by renodino (Curate) on Aug 31, 2007 at 20:16 UTC |