in reply to Fast enough yet?
You'll always have scaling problems when you write code like:
$aXML =~ s@</@{endtag}@gs; $aXML =~ s@/@{bs}@gs; $aXML =~ s@{endtag}@</@gs; $aXML =~ s@:@{colon}@gs; $aXML =~ s@;@{semicolon}@gs; ... $aXML =~ s@:;@\)\]@gs; $aXML =~ s@;([^:;]+?):@\[$1\(@gs; $aXML =~ s@`@>@gs; $aXML =~ s@{bs}@/@gs; $aXML =~ s@{colon}@:@gs; $aXML =~ s@{semicolon}@;@gs; ... while ($aXML =~ m@<([^<>]+?)>(.*?)</>@gs) { ... } ... while ($aXML =~ m@\[([^\[\]]*?)\]@gs) { ... }
By my count, you have to scan the aXML at least thirteen times to process it once, and some of those regular expressions have backtracking, so they'll end up scaling very badly too. For short aXML documents (a few dozen lines), it may be fast enough, but you'll start to notice performance degrade dramatically with documents of over a hundred lines.
With that said, this approach is more promising:
my @chars = split //, $aXML; ... foreach my $char (@chars) { ... }
... because it scales linearly with the size of the document. Perl 5's not super fast at processing strings character-by-character, but if you can write a state machine and decide what kind of Perl data structure to build at every state change of the document, you're much better off in terms of performance. This is what a lexer and grammar do when talking about compilers or custom languages. (You can even identify places where you don't have enough information to decide what to do right then, as in the case of your extension system—but you can encode that in your data structure and during evaluation decide what to do when you know what you need to know.)
Higher-Order Perl and SICP both describe how to handle this.
Incidentally, this is why people often say "Don't use regular expressions to parse _____!" — not because it's impossible to do, but because regular expressions really don't let you identify the state of individual items within a document in a way amenable to handling them correctly.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Fast enough yet?
by Logicus (Initiate) on Aug 06, 2011 at 12:24 UTC | |
by chromatic (Archbishop) on Aug 08, 2011 at 10:42 UTC | |
by Logicus (Initiate) on Aug 08, 2011 at 14:00 UTC |