MIME uses an idea that is kinda neat and could help here. You pick your placeholder, say "XXX", then search to see if that clashes (because it actually appears in the text already). If it does, then you look at the character after the first occurrance of it in your text and append any other character to your placeholder and search from that point again. Repeat until you reach the end of your text. You will have only traversed the text one time and when you are done you'll have a placeholder that does not appear anywhere in your text. Then you can append your sequence numbers (plus a non-digit terminator) to get your set of conflict-free placeholders.

But you might have to worry about your manipulations creating a conflict with this placeholder.

Another route would be to "escape" any occurrances of your placeholder both in the original text and in any substitutions that get applied to the text. Then unescape those after you replace the placeholders. For example:

$text =~ s/%/%%/g; # replace first block with "(%1%)" # replace second block with "(%2%)" # ... my %subs= ( replaceThis => "withThis", # ... ); for( @subs{ keys %subs } ) { s/%/%%/g; } $text =~ s/$_/$subs{$_}/g for keys %subs; # replace (%1%) with original first block # ... $text =~ s/%%/%/g;

Then you only have to worry about your manipulations accidentally changing a placeholder (which can often be easy to avoid in practice -- which it probably is in your case since you didn't appear worried about it).

the whole text-mangling routine is big enough that I want to minimize the number of passes (i.e., I don't want to run it on the multiple "interleaved" chunks between the 'raw' bits)

Your concern there appears to be one of speed of execution. You might reconsider this concern (or at least test it), as running the long mangling process several times on short strings could certainly end up not being much slower than running it once on the much longer full string.

It is certainly possible to just remove chunks from the string, note the resulting offsets to those spots, and keep running totals of how much these offsets were shifted by each substitution. But that is complex enough that it is quite easy to get it wrong, so I don't think I'd recommend that approach. And I can't think of any alternatives that are better than the above ones.

- tye        


In reply to Re: Anchors, bleh :( (escape) by tye
in thread Anchors, bleh :( by oko1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.