in reply to Pattern Matching, left-to-right
Here's another way to do it, with distinct patterns instead of a hash map:
while( m/ \G .*? (?= \\[FSTRE]\\ ) /gx ) { my $pos = pos; s{ \G \\F\\ }{ "|" }egx or s{ \G \\S\\ }{ "^" }egx or s{ \G \\T\\ }{ "&" }egx or s{ \G \\R\\ }{ "~" }egx or s{ \G \\E\\ }{ "\\" }egx; pos() = $pos + 1; }
In this code, m//g does the actual work of finding the control sequences in the string. The trick is to anchor all patterns with \G, which makes sure each pattern starts off where the last successful pattern stopped matching. That way you go through the string left-to-right.
In case of s///g, this means s///g will usually be replacing exactly one single occurence, despite the /g. \G \\S\\ will only match multiple times if the control sequences appears multiple times back-to-back as in the string \S\\S\\S\.
Unfortunately, if a s///g matches at least once, it also resets the end-of-last-successful-match position when it eventually fails. Therefore, the next match would normally start over at the beginning of the string, leading to recursive replacement problems with \E\S\ getting doubly translated to ^ instead of becoming just \S\ as per the spec. That explains the manual bookkeeping with pos, which queries and sets that position. With m//, this is elegantly avoidable by use of the /c modifier which means "do not reset end-of-last-position on failure".
All that said and done, for this problem, this solution is both much less efficient and harder to understand than the hash map based ones given by others. I post this merely as a trivial demonstration of \G, which really shines when the patterns you want to match in a coordinated fashion are very non-uniform, unlike the ones in this case. m//gc is how you build true parsers in Perl.
Makeshifts last the longest.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Pattern Matching, left-to-right
by ccn (Vicar) on Aug 22, 2004 at 08:52 UTC | |
by Aristotle (Chancellor) on Aug 22, 2004 at 09:07 UTC |